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Many  Monte  Carlo  simulation  problems  lend  themselves  readily  to  the  applica¬ 
tion  of  variance  reduction  techniques.  These  techniques  can  result  in  great  improve 
>  lents  in  simulation  efficiency.  This  document  describes  the  basic  concepts  of 
v  ariance  reduction  (Part  I),  and  a  methodology  for  application  of  variance  reduc- 
I  tion  techniques  is  presented  in  Part  II.  Appendices  include  the  basic  analytical 
expressions  for  application  of  variance  reduction  schemes  as  well  as  an  abstracted 
bibliography. 

The  techniques  considered  here  include  importance  sampling,  Russian  roulette 
end  splitting,  systematic  sampling,  stratified  sampling,  expected  values,  statistical 
estimation,  correlated  sampling,  history  reanalysis,  control  variates,  antithetic 
variates,  regression,  sequential  sampling,  adjoint  formulation,  transformations, 
orthonormal  and  conditional  Monte  Carlo.  Emphasis  has  been  placed  on  presenta¬ 
tion  of  the  material  for  application  by  the  general  user.  This  has  been  accomplish¬ 
ed  by  presenting  a  step  by  step  procedure  for  selection  and  application  of  the  appro¬ 
priate  technique  (s)  for  a  given  problem. 
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ABSTRACT 


Many  Monte  Carlo  simulation  problems  lend  themselves  readily 
to  the  application  of  variance  reduction  techniques.  These  techniques 
can  result  in  great  improvements  in  simulation  efficiency.  This  docu¬ 
ment  describes  the  basic  concepts  of  variance  reduction  (Part  I),  and  a 
methodology  for  application  of  variance  reduction  techniques  is  presented 
in  Part  n.  Appendices  include  the  basic  analytical  expressions  for 
application  of  variance  reduction  schemes  as  well  as  an  abstracted 
bibliography. 

The  techniques  considered  here  include  importance  sampling, 
Russian  roulette  and  splitting,  systematic  sampling,  stratified  sampling, 
expected  values,  statistical  estimation,  correlated  sampling,  history 
reanalysis,  control  variates,  antithetic  variates,  regression,  sequential 
sampling,  adjoint  formulation,  transformations,  orthonormal  and  con¬ 
ditional  Monte  Carlo.  Emphasis  has  been  placed  on  presentation  of  the 
material  for  application  by  the  general  user.  This  has  been  accomplished 
by  presenting  a  step  by  step  procedure  for  selection  and  application  of 
the  appropriate  technique  (s)  for  a  given  problem. 
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PART  I 


BASIC  CONCEPTS  OF  VARIANCE  REDUCTION 


EXECUTIVE  SUMMARY 


Monte  Carlo  simulation  is  one  of  the  most  powerful  and  commonly 
used  techniques  for  analyzing  complex  physical  problems.  Applications  can 
be  found  in  many  diverse  areas  from  radiation  transport  to  river  basin 
modeling.  Important  Navy  applications  include  analysis  of  antisubmarine 
warfare  exercises  and  operations,  prediction  of  aircraft  or  sensor  perform¬ 
ance,  tactical  analysis,  and  matrix  game  solutions  where  random  processes 
are  considered  to  be  of  particular  importance.  The  range  of  applications  has 
been  broadening  and  the  size,  complexity,  and  computational  effort  required 
have  been  increasing.  However,  such  developments  are  expected  and  desir¬ 
able  since  increased  realism  is  concomitant  with  more  complex  and  extensive 
problem  descriptions. 

In  recognition  of  such  trends,  the  requirements  for  improved  simula¬ 
tion  techniques  are  becoming  more  pressing.  Unfortunately,  methods  for 
achieving  greater  efficiency  are  frequently  overlooked  in  developing  simula¬ 
tions.  This  can  generally  be  attributed  to  one  or  more  of  the  following  reasons: 

•  Analysts  usually  seek  advanced  computer  sj  stems  to  perform 
more  complex  simulation  studies  by  exploiting  increased 
speed  and/or  storage  capabilities.  This  is  often  achieved 

at  a  considerably  increased  expense. 

•  Many  efficient  simulation  methods  have  evolved  for  specialized 
applications.  For  example,  some  of  the  most  impressive 
Monte  Carlo  techniques  have  been  developed  in  radiation  trans¬ 
port,  a  discipline  that  does  not  overlap  into  areas  where  even 

a  small  number  of  simulation  analysts  are  working. 

•  Known  techniques  are  not  developed  to  the  point  where  they  can 
be  easily  understood  or  applied  by  even  a  small  fraction  of  the 
analysts  who  are  performing  simulation  studies  or  developing 
simulation  models. 
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In  addition  to  the  above  reasons,  comprehensive  references  describing  ef¬ 
ficient  methodologies  to  improve  Monte  Carlo  simulation  are  not  available. 

It  is  the  intent  of  these  volumes  to  help  alleviate  the  above  shortcomings  in 
Monte  Carlo  simulation. 

This  document  is  the  third  of  three  volumes  which  present  techniques 
and  methods  for  developing  efficient  Monte  Carlo  simulations.  Each  volume 
is  essentially  a  self-contained  discussion  of  useful  techniques  which  can  be 
applied  in  reducing  computational  effort  in  one  of  the  following  three  major 
aspects  of  Monte  Carlo  simulation: 

•  Selecting  Probability  Distributions  -  Volume  I 

•  Random  Number  Generation  For  Selected  Probability 
Distributions  -  Volume  II 

•  Variance  Reduction  -  Volume  III 

The  purpose  of  these  volumes  is  to  provide  guidance  in  developing 
Monte  Carlo  simulations  that  accurately  reflect  the  behavior  of  various  char¬ 
acteristics  of  the  system  being  simulated  and  are  most  efficient  in  terms  of 
computational  effort.  The  basic  intent  is  to  provide  understanding  of  the  con¬ 
cepts  and  methods  for  reducing  analysis  and  computational  effort  as  well  as 
to  serve  as  a  practical  guide  for  their  application.  They  have  been  prepared 
primarily  for  the  systems  analyst  and  computer  programmer  who  have  a 
basic  background  and  experience  in  simulation  and  elementary  statistics. 
Thus,  the  material  is  presented  so  as  to  preclude  extensive  knowledge  of 
statistical  techniques  or  of  extensive  literature  search.  However,  it  is 
assumed  the  reader  has  a  grasp  of  the  fundamentals  of  Monte  Carlo  methods, 
simulation  modeling,  and  elementary  statistics. 
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VARIANCE  REDUCTION 


1.  INTRODUCTION 

A  useful  feature  of  Monte  Carlo  simulation  is  that  the  analyst  has 
the  flexibility  to  dictate  his  simulation  conditions  and  sampling  plans  to 
a  much  greater  extent  than  does  an  experimenter  in  a  real  world  environ¬ 
ment.  This  extra  latitude  provides  an  excellent  opportunity  for  optimal  de¬ 
sign  of  simulations  to  obtain  estimates  with  minimal  sampling  size.  This 
will  effectively  reduce  the  time  and  effort  involved  in  computation  as  the 
number  of  trials  necessary  to  achieve  a  given  accuracy  is  thereby  reduced. 
In  view  of  the  large  number  of  situations  where  simulation  results  can  be 
substantially  improved,  it  is  fair  to  say  that  no  simulation  problem  has  been 
justly  treated  until  the  possibility  of  applying  variance  reduction  techniques 
has  been  seriously  considered. 

The  procedures  which  are  available  in  the  design  of  a  Monte  Carlo 
simulation  for  minimizing  the  required  sample  size  are  generally  called 
variance  reduction  techniques.  The  intent  here  is  to  provide  the  analyst 
with  an  understanding  of  and  an  appreciation  for  several  variance  reduction 
techniques  and  to  provide  a  useful  guide  for  selecting  and  using  the  most 
appropriate  technique  for  his  particular  problem. 

It  is  difficult  to  provide  a  complete  perspective  on  variance  reduction 
techniques.  This  is  primarily  due  to  the  fact  that  there  are  an  infinite  num¬ 
ber  of  ways  Monte  Carlo  simulation  can  be  accomplished  for  a  given  problem 
and  each  could  conceivably  be  used  to  calculate  the  simulation  objective  al¬ 
though  with  greatly  different  efficiencies.  However,  it  appears  fair  to  say 
that  the  approach  to  improving  simulation  efficiency  was  not  seriously 
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considered  until  the  work  on  the  atomic  bomb  during  the  Second  World 

War.  'hi.s  work  initially  involved  the  use  of  ’’straightforward”  Monte 

Carlo  simulation  for  nuclear  particle  transport,  but  early  in  these  investi- 

(18) 

gations  Von  Neumann  and  Ulamv  ’  applied  certain  variance  reduction  tech¬ 
niques.  A  systematic  development  of  these  techniques  was  presented  by 

(19) 

Harris  and  Lahn  about  1948.'  Although  comprehensive,  this  detailed  work 
is  difficult  to  apply  to  general  problems.  Subsequent  application  and  develop¬ 
ment  of  variance  reduction  techniques  has  been  almost  exclusively  carried 
out  within  the  radiation  transport  community.  This  has  resulted  in  limited 
application  in  other  areas  where  Monte  Carlo  simulation  is  used.  It  is  the 
purpose  of  this  document  to  provide  a  mechanism  to  aid  in  a  wider  app’ication 
of  variance  reduction.  This  has  been  attempted  by  presenting  the  material 
in  two  parts. 

Part  I,  BASIC  CONCEPTS  OF  VARIANCE  REDUCTION,  presents 
the  fundamental  principles  and  relationships  among  several  variance  reduc¬ 
tion  techniques.  Part  I  is  intended  to  provide  the  reader  with  a  background 
and  an  understanding  of  variance  reduction.  It  is  recommended  that  the  user 
who  is  not  familiar  with  the  basic  concepts  review  Part  I  before  attempting  to 
implement  variance  reduction. 

Part  II  of  this  volume,  APPLICATION  OF  VARIANCE  REDUCTION 
TECHNIQUES,  comes  as  close  as  currently  practical  to  being  a  step-by-step 
procedure  for  application  of  variance  reduction.  However,  the  reader  should 
have  an  understanding  of  the  basic  principles  involved.  In  most  cases  con¬ 
siderable  ingenuity  and  insight  will  also  be  necessary.  The  approach  here  has 
been  to  present  a  conveniettf.  characterization  of  the  various  methods  con¬ 
sidered  for  purposes  of  selection.  This  is  followed  by  a  summary  of  guide¬ 
lines  on  how  to  actually  apply  each  method. 
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This  volume  also  includes  other  information  useful  in  applying  vari¬ 
ance  reduction  techniques  to  Monte  Carlo  problems.  Appendix  /  presents 
a  summary  of  the  pertinent  analytical  formulations  and  Appendix  E  is  an 
abstracted  bibliography  of  useful  references. 
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2.  CHARACTERIZATION  OF  VARIANCE  REDUCTION  TECHNIQUES 


In  this  section  the  general  characteristics  of  variance  reduction  tech¬ 
niques  will  be  introduced.  In  Section  3  each  method  will  be  discussed  in  detail. 

2. 1  CLASSIFICAT ION  OF  TECHNIQUES 

As  the  name  implies,  variance  reduction  is  concerned  with  increasing 
the  accuracy  of  Monte  Carlo  estimates  of  parameters.  A  simulation  using  one 
or  more  reduction  techniques  can  be  contrasted  with  what  may  be  considered 
the  crude  (sometimes  called  direct  or  straightforward)  Monte  Carlo  approach 
where  an  attempt  is  made  to  create  true-to-life  or  actual  models  of  the  process. 
In  crude  sampling,  flows  through  the  model  and  sampling  probability  distribu¬ 
tions  are  chosen  to  reflect  the  real  situation  as  exactly  as  possible.  On  the 
other  hand,  variance  reduction  techniques  attempt  to  increase  the  effectiveness 
of  the  Monte  Carlo  method  by: 

•  Modifying  the  simulation  procedure 

•  Utilization  of  approximate  or  analytical  information 

•  Studying  the  system  within  a  different  context  or  abstract 
representation 

Based  on  these  approaches  a  general  classification  of  several  known  variance 
reduction  schemes  is  presented  in  Table  2. 1.  Many  of  the  techniques  presented 
in  Table  2. 1  are  related  and  it  is  difficult  to  arrive  at  a  completely  distinct 
classification.  However,  the  manner  in  which  they  are  presented  here  is  useful 
for  subsequent  discussions. 

Modifying  the  sampling  process  is  usually  achieved  by  using  more  ef¬ 
fective  sampling  techniques  or  altering  the  sampling  distributions.  As  an 
example  consider  the  problem  of  estimating  the  probability  of  an  early  failure 
in  a  piece  of  electronic  equipment,  and  suppose  that  the  failure  distribution 
for  this  equipment  is  exponential  with  a  very  long  mean  time  between  failures 
(MTBF).  In  a  crude  Monte  Carlo  evaluation  the  ratio  of  the  number  of  early 
failure  to  the  total  number  of  simulated  failures  is  very  small.  Thus,  in 
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TABLE  2. 1 

Classification  of  Variance  Reduction  Techniques 

•  MODIFICATION  OF  THE  SAMPLING  PROCESS 

•  Importance  Sampling 

•  Russian  Roulette  and  Splitting 

•  Systematic  Sampling 

•  Stratified  Sampling 

•  USE  OF  ANALYTICAL  EQUIVALENCE 

•  Expected  Values 

•  Statistical  Estimation 

•  Correlated  Sampling 

•  History  Reanalysis 

•  Control  Variates 

•  Antithetic  Variates 

•  Regression 

•  SPECIALIZED  TECHNIQUES 

•  Sequential  Sampling 

•  Adjoint  Formulation 

•  Transformations 

•  Orthonormal  Functions 

•  Conditional  Monte  Carlo 

order  to  generate  confidence  in  an  estimate  for  the  probability  of  early 
failure,  one  must  simulate  a  very  large  number  of  failures.  The  num¬ 
ber  of  simulated  events  required  can  be  substantially  reduced,  however,  if 
the  failure  distribution  in  the  simulation  is  suitably  modified.  In  particular, 
if  an  exponential  distribution  with  a  short  MTBF  is  substituted  for  the  actual 
failure  distribution,  more  early  failures  will  be  observed,  and  thus  a  more 
accurate  answer  can  be  derived  with  lesF  simulation  effort.  This  procedure 
is  referred  to  as  importance  sampling.  Of  course,  the  modifications  intro¬ 
duced  in  the  sampling  distribution  must  be  accounted  for  when  determining 
the  desired  estimate  since  the  failure  processes,  ('actual  and  modified)  are 
not  the  same. 
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The  above  example,  simulating  events  of  very  low  probability,  illus¬ 
trates  one  area  where  variance  reduction  techniques  are  always  beneficial,  if 
not  an  absolute  necessity.  If  the  occurrence  of  an  event  in  a  process  is  on  the 
order  of  one  in  a  thousand,  then  one  would  expect  an  event  to  occur  only  once 
in  every  thousand  direct  simulations  of  the  process.  Since  the  accuracy  in 
measuring  an  event  is  related  to  the  number  of  times  it  occurs,  the  crude 
simulation  has  to  be  run  many  thousands  of  times  before  much  accuracy  is 
achieved.  The  common  variance  reduction  procedure  in  these  cases  involves 
altering  the  simulation  in  a  Known  way  so  that  the  rare  events  can  be  ob¬ 
served  more  frequently. 

Other  forms  of  variance  reduction  are  based  on  the  fact  that  analytic 
procedures  are  usually  preferable  to  simulation.  Thus,  reverting  to  simu¬ 
lation  implies  the  problem  does  not  have  a  readily  available  analytic  solution. 
However,  in  many  cases  segments  of  the  process  may  be  amenable  to  deter¬ 
mining  a  closed  form  solution.  In  other  cases,  the  overall  process  or  seg¬ 
ments  of  the  process  may  be  closely  correlated  to  a  simpler,  approximate 
process  with  known  analytic  solutions.  In  both  situations  substantial  improve¬ 
ment  can  be  realized  by  taking  advantage  of  this  knowledge.  This  class  of 
techniques  is  described  by  the  term  "use  of  analytical  equivalence". 

As  a  simple  example  of  the  use  of  analytical  equivalence,  consider 
again  a  piece  of  electronic  equipment.  Suppose  this  time,  however,  that  the 
failure  distribution  of  the  equipment  is  not  exponential,  but  assume  that  the 
exponential  distribution  may  serve  as  a  first  approximation  to  it.  The  correla¬ 
tion  approach  to  variance  reduction  involves  investigating  the  failure  proper¬ 
ties  of  this  equipment  by  tatt/g  advantage  of  this  knowledge  and  simulating 
the  difference  between  the  actual  and  the  approximate  exponential  failure  rate 
instead  of  simulating  the  actual  process.  The  properties  of  the  actual  process 
can  then  be  inferred  using  the  analytic  properties  of  the  exponential  distribu¬ 
tion  and  the  results  from  the  simulation  on  the  difference  between  the  actual  and 
exponential  distribution.  This  approach  is  called  control  variates. 
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In  addition  to  sampling  modification  and  analytical  equivalence,  there 
are  certain  specialized  techniques  that  can  be  used  to  achieve  variance 
reduction.  These  procedures  may  include  the  application  of  one  or  more  of 
the  above  techniques  in  its  implementation.  One  powerful  procedure  is  called 
sequential  Monte  Carlo.  In  order  to  effectively  employ  variance  reduction  in 
a  simulation,  some  knowledge  about  the  process  and  the  answers  to  be  genera¬ 
ted  must  exist.  One  way  to  gain  this  information  is  through  a  direct  simula¬ 
tion  of  the  process.  Results  from  this  simulation  can  then  be  used  to  define 
variance  reduction  tecbxvv^s  which  will  refine  and  improve  the  efficiency  of 
a  second  simulation,  fa  complex  problems,  several  iterations  may  be  called 
for. 

Another  procedure  which  often  proves  valuable  in  developing  variance 
reduction  procedures  is  to  consider  the  process  from  various  viewpoints.  In 
many  flow  processes,  for  example,  hints  for  effective  importance  functions 
can  be  gained  by  considering  the  process  in  reverse  or  looking  at  the  mathe¬ 
matical  adjoint  of  the  problem  under  study.  However,  as  with  many  of  the 
specialized  techniques  described  in  Table  2. 1,  it  is  not  adequately  developed 
for  general  application. 

Generally  variance  reduction  techniques  can  be  aimed  at  reducing  the 
variance  of  the  estimate  of  only  one  parameter  or  aspect  of  the  process 
being  simulated.  Using  variance  reduction  techniques  on  one  parameter  can 
reduce  the  effectiveness  of  the  simulation  to  estimate  other  parameters.  It 
is  very  important,  therefore, to  first  determine  all  of  the  results  which  will 
be  desired  from  the  simulation  before  searching  for  a  technique  to  apply  to  a 
given  situation. 

If  several  quantities  (parameters)  are  to  be  estimated  by  the  simula¬ 
tion,  the  selection  of  a  variance  reduction  technique  has  to  be  considered 
from  the  standpoint  of  all  of  these  parameters.  In  many  circumstances  it 
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may  be  beneficial  to  create  a  different  Monte  Carlo  method  to  estimate  each 
parameter.  The  goal  for  each  simulator  would  be  efficient  measurement  of 
a  specific  parameter. 

Each  of  the  techniques  or  procedures  introduced  in  Table  2. 1  will  be 
discussed  in  detail  in  subsequent  sections. 

2. 2  VARIANCE  REDUCTION  AND  KNOWLEDGE  OF  THE  PROCESS  TO 

BE  SIMULATED 

As  the  discussion  of  the  previous  section  suggests,  variance  reduction 
can  be  viewed  as  a  means  to  use  known,  usually  qualitative,  information  about 
the  process  in  an  explicit  and  quantitative  manner.  In  fact,  if  nothing  is  known 
about  the  process  to  be  simulated,  variance  reduction  cannot  be  directly 
achieved.  (However,  sequential  sampling  may  be  used  to  generate  the  required 
knowledge.)  The  other  extreme  from  no  knowledge  is  complete  knowledge, 
and  in  this  case  a  zero  variance  simulation  can  be  devised.  Put  very  simply, 
var  iance  reduction  techniques  cannot  give  the  user  something  for  nothing;  it 
is  merely  a  way  of  not  wasting  information.  Therefore,  the  more  that  is  known 
about  the  problem,  the  more  effective  variance  reduction  can  be  and  the  more 
powerful  are  the  techniques  that  can  be  employed.  Hence,  it  is  always  impor¬ 
tant  to  clearly  define  as  much  as  is  possible  what  is  known  about  a  problem. 

Knowledge  of  a  process  to  be  simulated  can  be  qualitative  and/or 
quantitative.  Both  are  useful.  It  is  important  to  use  all  the  information  avail¬ 
able,  and  in  fact  it  may  be  useful  to  do  limited  crude  simulations  of  the  process 
to  gain  some  knowledge,  especially  if  a  little  data  might  lead  to  extensive 
insight.  Selection  of  a  variance  reduction  technique(s)  for  a  particular  simu¬ 
lation  is  thus  peculiar  to  that  simulation,  and  general  procedures  are  difficult 
to  establish.  However,  the  mental  exercise  and  the  initial  groundwork  that 
must  be  established  in  order  to  select  or  evalute  the  usefulness  of  applying 
these  techniques  is  almost  always  worth  the  effort.  Searching  for  a  technique 
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forces  the  simulation  designer  into  asking  the  basic  questions  of:  (1)  "What 
answers  are  to  be  generated  from  the  simulation,"  and  (2)  what  is  known  about 
the  behavior  of  the  process"? 

Problem  definition  is  thus  of  paramount  importance.  Before  consider¬ 
ing  variance  reduction  techniques  it  is  important  to  characterize  aspects  of 
the  problem  which  might  indicate  which  might  be  fruitfully  applied.  To  evalu¬ 
ate  the  usefulness  of  these  methods  for  a  particular  problem  it  is  necessary 
to: 

•  List  all  of  the  parameters  to  be  estimated  from  the  simulation. 

•  Determine  all  the  available  knowledge  on  the  internal  workings 
cf  the  process  to  be  simulated. 

In  fact, clearly  delineating  such  information  is  the  basis  for  the  approach  pre¬ 
sented  in  Part  n,  APPLICATION  OF  VARIANCE  REDUCTION  TECHNIQUES. 

2.  3  INTEGRAL  REPRESENTATION 

In  principle  a  Monte  Carlo  procedure  can  be  interpreted  as  a  method 
for  evaluating  an  integral,  or  more  graphically,  the  area  under  a  curve.  Since 
integrals  can  also  be  evaluated  by  analytic  or  numerical  methods,  reverting 
to  Monte  Carlo  simulation  implies  either  a  very  complex  integration  or, 
more  generally,  an  inability  to  represent  the  problem  in  integral  form.  Knowl¬ 
edge  that  the  Monte  Carlo  proceduredoes  have  an  integral  representation  and 
determining  the  explicit  form  cf  that  integral  is  fundamental  to  understanding 
and  developing  variance  reduction  techniques. 

An  intuitive  justification  for  the  integral  representation  can  be  given 
by  considering  how  the  Monte  Carlo  method  works.  The  model  of  the  process, 
or  simulation,  is  exerc/sed  numerous  times.  Conclusions  about  the  process 
are  drawn  by  averaging  the  individual  outcomes.  From  a  probabilistic  view¬ 
point,  averaging  is  a  means  for  estimating  particular  types  of  integrals  known 
as  expectations  or  expected  values. 
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Symbolically,  suppose  g(X^, . . .  ,Xft)  is  the  outcome  or  result  obtained 
from  a  simulation.  The  X j  values  represent  a  particular  outcome*  from 
each  of  the  random  processes  affecting  the  characteristic  of  the  system  being 
estimated.  To  simplify  the  presentation,  let  x  represent  the  vector 
(Xj,...,x  ).  If  f(x)  denotes  the  probability  density  function  of  2  (i.e. , 
joint  probability  density  function  of  xi»  •  •  • » xn) »  then  the  objective  of  the 
Monte  Carlo  simulation  is  to  estimate  the  integral 

I  =  E[g(x)]  =  J  g(x)f(x)dx  .  (2.1) 

A  crude  application  of  Monte  Carlo  would  obtain  an  estimate  I  by 
selecting  a  random  sample  . . . ,  from  f(x)  and  compute  the  sample 
mean  using 

N 

i  =  etfj)  (2-  2) 

ui 

*  (14) 

The  law  of  large  numbers  ensures  the  convergence  of  I  to  I  in  most  cases. 

A 

It  is,  of  course,  true  that  1  is  a  random  variable  and  that  the  expected 
value  of  I  equals  I.  That  is, 

E[f]  =  I  (2.3) 

A 

It  is  said  that  I  is  an  unbiased  estimator  for  I  when  (2. 3)  holds.  This  is 
important  to  keep  in  mind  when  estimators  for  variance  reduction  are  con¬ 
structed  since  variance  reduction  can  lead  to  biased  estimators  unless  care 
is  taken. 

+ 

Using  general  notation,  X  represents  a  particular  outcome  of  the  random 
variable  x. 
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An  estimate  of  the  error  in  the  estimator  I  is  given  by  the  sample 


variance  S  ,  where 


N 


N 


s2  =  (ITD  L  -  il2  -  f£l  H  £  *  <*i> 

i=l  I 


(2.4) 


2  2 
S  is  commonly  used  as  an  estimate  for  a  ,  the  population  variance, 


which  is  defined  as 


a2  -  E[(g(J)  -I)2]  (2.5) 

o 

S  is  also  used  as  a  basis  for  evaluating  the  effectiveness  of  Monte 

Carlo  simulations.  A  basic  measure  for  such  effectiveness  if  E[(l-I)2].  It  is 
* 

easy  to  see  that 


E[(I-I)2]  =  a2/N 


(2.6) 


Note  that  as  N  -  - ,  E[(i-I)2]  -*  0.  ** 

2  2  2 

Now,  since  a  is  estimated  using  S  ,  an  estimate  for  E[(I-I)  ]  is  con¬ 
structed  using 

s2  _  S2  1  fl 

TT  ' 

The  estimator  s  is  often  used  as  an  absolute  measure  of  the  accuracy  of 
a  simulation. 


fy^-i2  (2.7) 

ui  ) 


It  is  assumed  that  a  simulation  will  consist  of  N  statistically  independent 
histories. 

*Since  E[(l-1)2}  -*0  as  N  -» » ,  then  I  is  said  to  be  a  consistent  estimator 
for  I. 
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Use  of  the  integral  representation  provides  a  convenient  mechanism 
to  develop  and  apply  variance  redaction  in  simulation,  and  if  possible,  such 
a  representation  should  be  established.  As  a  trivial  example  of  how  this  might 
be  accomplished  consider  the  queueing  system  shown  in  Fig.  2. 1.  Here  t 
indicates  time.  Further  it  is  assumed  that  f^(t), . . .  ,f?(t)  are  probability 
density  functions  for  the  time  required  to  go  through  the  corresponding  box. 

P11  and  P12  are  respective  probabilities  for  going  along  the  paths  indicated. 
Similarly  for  p21  and  p22< 

It  is  easy  to  see  that  the  average  time  to  pass  through  the  system 
is  given  by 

I  =  f  t[fx(t)  +  pnf2(t)  +  pl2f3(t)  +  f4(t)  +  f5(t)  +  p21f6(t)  +  f7(t)]dt 
=  f  t  f(t)dt 

which  has  the  same  qualitative  form  as  Eq.  2. 1.  Such  integral  representa¬ 
tion  can  greatly  simplify  the  application  of  variance  reduction  techniques 
and  will  be  used  as  a  basis  for  the  discussion  presented  later. 


Fig.  2. 1.  Schematic  of  a  Simple  Queueing  System 
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2. 4  EFFICIENCY  OF  VARIANCE  REDUCTION 


This  section  presents  the  basic  ideas  and  practical  expressions  for 
estimating  the  efficiency  cf  variance  reduction  techniques. 


2.4.1  General  Concepts 


The  measure  introduced  in  the  previous  discussion  that  will  be  used 

to  evaluate  the  effectiveness  of  a  simulation  was  E[(I-I)  ].  This  is  estimated 
o 

using  s  defined  by  (2. 7).  That  is, 


1 


is  an  estimate  for  the  variance  of  I. 


It  can  be  shown  that 


(2.8) 


E[s2]  =  E[(I  -  I)2]  =  o2/N  (2. 9) 

where  or2  is  the  variance  of  g(x)  and  N  is  the  sample  size  or  the  number 
of  histories. 

It  can  be  seen  from  (2. 9)  that^  as  the  number  of  histories,  N,  in- 
* 

creases,  the  closer  I  will  come  to  I. 

Another  way  to  consider  this  is  in  terms  of  intervals  of  uncertainty. 

(14) 

For  example  it  is  known  from  basic  statistics'  that,  with  high  probability 
the  estimate  I  will  fall  between  1  -  ko/^ST  and  I  +  k o//T  where  k  is  some 
constant.  Thus  for  a  fixed  k,  the  convergence  of  the  estimate  is  related 
to  the  number  of  histories  ,  N,  and  the  variance  of  g(x). 

Two  approaches  can  be  taken  to  increase  the  accuracy  of  the  estimator, 
I.  One  is  to  increase  the  number  of  histories.  The  other  is  to  reduce  the 
variance  (a)  associated  with  each  observation.  The  disadvantage  of  increasing 
the  number  of  iterations  (i.e. ,  the  size  of  N)  is  obvious.  For  example,  to  re¬ 
duce  the  interval  of  uncertainty  by  a  factor  of  two,  thus  doubling  the  accuracy, 
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four  times  as  many  histories  would  be  required  (for  a  fourfold  increase  in 
computing  time).  Eventually  it  becomes  prohibitively  expensive  to  gain 
further  accuracy  by  increasing  the  number  of  histories.  Therefore, 
achieving  variance  reduction  which  reduces  the  variance  associated  with 
each  history,  a ,  is  highly  desirable  for  improvements  in  the  answers. 

To  evaluate  the  efficiency  gained  in  the  use  of  variance  reduction 

techniques  it  is  clearly  desirable  to  have  a  quantitative  measure.  This  can 

readily  be  established  based  on  the  ideas  introduced  above.  Suppose  two 

simulation  method  exist  for  estimating  the  same  parameter  I.  Let  the 

2 

variance  per  history  associated  with  the  first  simulation  method  be  o.  and 

2  1 

that  associated  with  the  second  be  ag.  It  is  desired  that  the  result  be  known 

within  an  uncertainty  of  e  (i.e. ,  the  estimate  I  fall  in  the  interval  I-c 

2  2  2 

to  I+c).  For  this  to  happen  with  high  probability  will  require  =  k  a^/f 

histories  for  the  first  method.  For  the  second  method,  it  will  require 

N?  =  k  ag/ c  histories.  In  general  the  two  methods  will  require  different 

amounts  of  computational  effort  to  generate  each  history.  Let  the  computer 

time  taken  per  history  by  the  first  method  be  t^  seconds  and  by  the  second 

t9  seconds.  Then  the  total  time  required  for  the  first  method  to  achieve  the 

desired  accuracy  would  be  k  a.t./c  .  Total  time  for  the  second  method 
2  2  2  t  * 

would  be  k  Og  tg/c  .  The  relative  efficiency  of  the  two  simulation  methods 
is  given  by  the  ratio  of  the  computing  time3  required.  Thus, 


efficiency  =  c 


(2. 10) 


which  is  the  relative  time  advantage  gained  by  using  the  second  method . 

In  most  applications  a  variance  reduction  method  is  being  compared 

2 

to  crude  sampling.  That  is,  and  o^  would  be  that  obtained  when  crude 
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sampling  is  used,  while  and  a2  re*er  to  the  computation  using  the  vari¬ 
ance  reduction  method. 


2. 4.  2  Estimation  of  Variance  Reduction  Efficiency 

2 

The  difficulty  in  using  definition  (2. 10)  for  efficiency  is  that  a.  and 

2  1 
Og  are  rarely  known.  However,  it  is  reasonable  to  replace  them  by  their 

estimators  and  get  an  estimator  for  r  , 


t  s2 

*lsl 


(2.11) 


*2S2 


where 


ifii  r1Zg2(5i)-‘Ii 

1  I  1  i=l 


(2. 12) 


with 


"'i  ■  frZ  8(i9 

1  i=l 


(2. 13) 


and  3^, . . .  ,3^  being  a  random  sample  obtained  with  crude  Monte  Carlo. 
Also,  1 


o2  2  IT  *2 

S2  N2-l  n2^  g  ^  ’*2 


(2. 14) 


with 


h  * 


(2. 15) 
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and  X' , . . .  ,X'  being  a  random  samp le  obtained  using  variance  reduction. 

1  Ng 

It  is  important  to  recognize  that  e  is  a  random  variable  and  in  practi- 

2  2 

cal  application  will  be  subject  to  random  variations.  In  fact,  as  and  Sg 
are  second  order  quantities,  they  will  be  subject  to  much  larger  variation 

A 

than  first  order  parameters  such  as  I. 

Note  that  the  use  of  (2.12)  and  (2. 14)  assumes  that  independent  ran¬ 
dom  histories  were  available.  However,  the  application  of  many  variance 
reduction  techniques  will  not  produce  histories  that  are  statistically  indepen¬ 
dent.  This  is  particularly  true  when  stratified,  systematic  sampling,  or 
Russian  Roulette  and  splitting  are  used.  In  some  cases  correlated  sampling 
and  history  reanalysis  will  also  produce  samples  that  are  not  independent . 

In  cases  where  a  truly  random  sample  is  not  available  (or  suspected 
to  be  not  available),  it  is  convenient  to  use  a  batching  process  to  estimate 
the  sample  variance.  The  general  guidelines  to  follow  in  application  of  batch¬ 
ing  are: 

1.  Obtain  a  sample,  say  g(jL), . . .  ,g(5Lj)  consisting  of  N  his¬ 
tories  (which  may  or  may  not  be  independent). 

2.  Group  the  histories  into  batches  such  that  the  batches  are  in¬ 
dependent  and  equivalent.  For  example,  it  may  be  possible 
to  arrange  the  histories  so  that  the  sample  contained  within 
any  batch  will  be  independent  from  the  samples  in  any  other 
batch.  However,  the  samples  within  a  batch  may  be  correlated 
with  each  other.  In  the  case  of  stratified  sampling,  each  batch 
must  consist  of  the  same  number  of  samples  from  the  same 
strata.  (Typically,  the  number  of  batches,  Ng,  should  be  be¬ 
tween  10  and  50. ) 


3.  Construct  an  average  in  each  batch  for  the  parameters  being 
estimated.  That  is,  if  g(5L), . . .  ,g$N  )  are  contained  in 
batch  1,  then  set  1 

h  -  F.  Z  6<*i>  (2 

1  i=l 

where  it  is  assumed  that  there  are  sample  in  each  batch. 
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4.  Construct  a  final  estimate  for  I  using 


Nfi 


B  1 


i=l 


5.  Obtain  an  estimate  for  a  from 


s2  = 


m 


N 


N 


B 


Nr  2-  i 
B  i=l 


(2. 17) 


(2. 18) 


In  essence,  each  batch  is  being  considered  a sa  separate  small  simu¬ 
lation  run.  Parameters  are  estimated  as  the  average  of  the  estimates  ob¬ 
tained  in  each  batch.  The  sample  variance  of  the  different  batch  estimates  pro¬ 
vides  a  basis  for  estimating  the  variance  of  the  final  average.  This  technique 
is  completely  general;  it  will  work  in  all  cases  no  matter  what  combination 
of  variance  reduction  techniques  are  being  used  nor  what  kind  of  parameter 
is  being  estimated.  Batching  may  not  provide  the  best  estimate  in  all  cases; 
usually  a  better  estimator  can  be  constructed  for  any  particular  techniques 
being  used.  However,  there  frequently  are  easily -missed  subtleties  in  en¬ 
suring  that  an  estimator  is  based  on  independent  and  equivalent  samples.  It 
is  generally  best  to  avoid  the  analysis  required  to  generate  an  estimator  valid 
for  the  particular  methods  employed  -  and  also  avoid  the  pitfall  of  constructing 
an  erroneous  estimator  -  by  using  batching  to  calcu\a!tp  variances. 

2.4.3  Estimation  of  Confidence  Intervals 

In  some  applications,  it  is  of  interest  to  calculate  confidence  inter¬ 
vals  for  estimated  parameters  when  variance  reduction  is  used.  Under  the 

(14) 

usual  assumptions,  the  confidence  interval  of  size  a  can  be  obtained 
from  the  following  expression: 
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(2. 19) 


P 


/IT  /RT 


a 


where  a  may  be  obtained  from  Table  2. 2.  The  value  of  S  may  be  obtained 
using  (2.4)  or  (2. 18).  Then,  the  interval  I  -  ;  I  +  ZlL  is  said  to  be 

/W  Jn 

a  100  a%  confidence  interval  for  the  estimate  of  I. 


2. 5  THE  PITFALLS  OF  OVERBIASING  AND  UNDERBIASING 

The  goal  of  variance  reduction  is  improved  efficiency,  that  is,  making 
the  best  use  of  computing  time  to  simulate  events  which  are  most  significant 
to  the  final  answer.  In  modifying  the  sampling  to  bring  this  about,  it  is 
possible  to  overshoot  the  mark  and  produce  a  sampling  scheme  that  is  so 
strongly  biased  as  to  be  less  efficient  than  crude  samp '  ing.  This  is  termed 
'overbiasing'  or  'oversampling'.  The  opposite  term,  'underbiasing'  or 
'undersampling',  is  used  to  apply  to  the  crude  or  slightly  modified  sampling 
scheme  when  the  result  depends  heavily  on  infrequent  events  and  not  enough 
observations  occurred  for  good  statistics. 

It  is  a  general  characteristic  of  both  overbiased  and  underbiased 
situations  that  most  of  the  time  the  answers  generated  are  too  small.  This 
produces  an  apparently  consistent  bias  in  the  results  which  can  be  more 
troublesome  than  poor  confidence  intervals  in  the  result.  Furthermore, 
variance  estimates  are  also  generally  small  so  that  the  confidence  intervals 
calculated  in  the  simulation  will  tend  to  indicate  that  the  results  are  much 
more  accurate  than  they  really  are.  This  generates  a  false  sense  of  security 
and  faith  in  results  which  are  actually  consistently  bad. 

As  an  extremely  simplified  example,  consider  a  simulation  in  which 
there  are  basically  two  classes  of  events.  One  type  of  event  (X^)  occurs 
frequently  ( ffXj)  =  .  9999)  but  contributes  only  a  small  amount  (g(Xj)  =  .  01)  to 
the  final  result  while  the  other  type  (event  Xg)  is  rare  (f(Xg)  =  .  0001)  but 
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TABLE  2.2 

Table  of  the  Standard  Cumulative  Normal  Distribution 
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makes  a  large  contribution  (g(Xg)  =  100)  when  it  does  occur.  In  this  example, 
the  integral  I  being  estimated  has  the  correct  value: 

I  =  £  (probability  of  event  i)*  (value  of  event  i' 
i 

-  £(X1)g(X1)  +  f(X2)g(X2) 

=  .  02  (2. 20) 

However,  using  crude  Monte  Carlo  with  a  moderate  (several  hundred  to  a 
thousand)  number  of  histories,  event  X2  would  very  probably  never  occur 
and  the  ’underbiased'  answer  would  be  recorded  as 

Iu  --  g(Xj)  =  .01  (2.21) 

If  it  was  realized  that  Xg  events  made  such  a  heavy  contribution  to  the  re¬ 
sult,  one  natural  response  would  be  to  modify  the  simulation  so  that  Xg 
events  occurred  frequently  (see  the  discussion  of  importance  sampling  in 
Section  3. 1. 1  for  an  explanation  ot  the  formulas  used  in  this  example).  If 
this  modification  was  carried  to  excess,  say  new  probabilities  of  f*(Xg)^.  9999 
and  f+(Xj)  =  .  0001  were  used,  then  X j  events  would  not  occur  in  a  run  of 
moderate  size  and  the  'overbiased '  estimate  would  turn  out  to  be 

‘o  =  g(x2)-pi^i =  10°-  “  01  •  <2-22) 

The  proper  modification  for  this  example  is  to  let  X^  and  Xg  events  occur 
with  equal  probability,  f*(Xj)  =  i*(Xg)  = .  5.  Then,  the  contribution  from 
each  history  is 

(2.  23) 
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and  the  final  estimate  from  a  small  sample  would  be 

t  =  .02  (2.24) 

The  above  example  is  somewhat  extreme  but  illustrates  the  general  nature 
of  most  simulations  where  variance  reduction  is  needed.  The  underlying  dis¬ 
tribution  is  highly  skewed  with  the  large  majority  of  cases  making  little  or 
no  contribution  to  the  final  answer  while  a  small  number  <f  cases  can  make 
large  contributions.  In  both  the  'overbiased 1  and  'underbiased'  example, 
the  final  estimates  were  s mailer  than  the  correct  value  aid  this  is  also  a 
general  characteristic  of  such  cases.  In  the  example,  if  a  set  of  100  his¬ 
tories  was  simulated  using  crude  Monte  Carlo,  then  most  likely  there 
would  be  no  Xg  events  observed  and  the  (incorrect)  estimate  would  be  .01. 
Once  in  every  100  sets  of  100  histories,  a  single  Xg  event  would  be  simu¬ 
lated.  For  that  set  of  histories  the  estimate  would  be 

r  =  1/100(99.  .01  +  1-100]- 1.01  ,  (2.25) 

\  number  veiy  much  larger  than  the  correct  value.  (Notice  that  this  makes 
the  estimation  average  out  correctly  in  the  long  run.)  Unfortunately,  at  this 
stage  the  human  factor  enters  the  problem.  Most  users  confronted  with  several 
similar  runs  giving  values  of  .01  and  one  run  giving  1.01  will  decide  that  the 
1.01  estimate  was  the  result  of  some  input  mistake  or  computer  error,  and 
throw  out  that  run. 

In  this  example  the  variance  estimates  produced  would  be  zero 
for  all  runs  except  the  one  in  a  hundred  which  had  a  mean  value  of  1.01. 

For  this  case  the  relative  standard  de\Sat\on  would  be  almost  100%,  a  sure 
sign  of  insufficient  sampling. 

Therefore  caution  is  recommended  in  simulations  where  most  his¬ 
tories  contribute  a  small  bit  to  the  answer  but  a  few  histories  contribute  a 
large  value,  and  complete  faith  should  not  be  placed  in  estimates  of  variance 
especially  when  the  results  are  smaller  than  expected  or  if  the  possibility  of 
overbiasing  or  underbiasing  is  suspected. 
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3.  VARIANCE  REDUCTION  TECHNIQUES 


To  provide  a  reasonable  presentation  of  variance  reduction,  it  is 
imperative  that  some  organization  be  given  to  relate  the  various  techniques. 
To  this  end  the  techniques  or  approaches  for  achieving  variance  reduction 
were  grouped  in  the  following  three  classes  which  were  introduced  in  the 
previous  section. 

•  Modification  of  the  sampling  process 

•  Use  of  analytical  equivalence 

•  Specialized  techniques 

A  summary  of  the  specific  variance  reduction  techniques  in  each  of  these 
classes  was  presented  in  Table  2. 1. 

The  techniques  which  modify  the  sampling  process  effectively  alter 
the  probability  distributions  of  the  random  variables  so  that  the  more  signi¬ 
ficant  events  are  observed  more  often.  The  use  of  analytical  equivalence 
exploits  analytical  expressions  and  expected  values  to  explain  or  approxi¬ 
mate  the  majority  of  the  phenomena,  thus  leaving  only  the  most  interesting 
portions  of  the  process  to  be  simulated.  Specialized  approaches  encompass 
the  more  sophisticated  techniques  for  achieving  variance  reduction. 

In  this  section  of  the  report,  the  techniques  presented  in  each  of  these 
three  classes  is  discussed  in  detail. 

3. 1  MODIFICATION  OF  THE  SAMPLING  PROCESS 

Variance  reduction  techniques  in  this  class  include: 

•  Importance  Sampling 

•  Russian  Roulette  and  Splitting 

•  Systematic  Sampling 

•  Stratified  Sampling 
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These  hare  several  common  characteristics  in  that  they  all  reduce 
the  variance  of  the  estimate  by  sampling  from  a  probability  distribution  dif¬ 
ferent  from  the  true  physical  distribution.  In  ibis  way  more  of  the  interest¬ 
ing  events  will  be  observed,  i.  e. ,  more  of  the  events  that  contribute  to  the 
result  being  estimated  will  be  observed  and  less  computing  time  will  be 
spent  on  events  of  no  importance  to  the  results.  These  techniques  also  pre¬ 
serve  the  actual  physical  process  of  the  system  in  the  simulation  mode.  Only 
the  probabilities  are  altered;  the  flow  of  events  remains  essentially  the  same. 

3. 1. 1 

3. 1. 1. 1  General  Concepts 

Under  this  scheme  the  sampling  distributions  which  would  be  used  in 
the  direct  simulation  are  replaced  with  ones  which  force  the  sampling  into 
more  interesting,  or  important  regions.  For  instance,  in  tossing  a  pair  of 
dice,  if  one  is  interested  in  the  occurrence  of  a  three,  one  could  weight  or 
bias  each  die  toward  the  numbers  one  and  two.  The  biasing  of  the  sampling 
distributions  is  done  in  a  known  fashion  so  that  this  information  can  be  used 
to  alter  the  computation  of  the  results  so  as  to  unbias  the  answers. 

Mathematically  the  importance  sampling  idea  can  be  illustrated  by 
considering  a  Monte  Carlo  estimate  of  a  parameter  I  where 

I  =  E[g(x)]  =  Jg(x)f(x)dx  .  (3.1) 

The  direct  or  straightforward  Monte  Carlo  procedure  would  be  as  follows: 

•  Select  a  random  aaxojta  X- , . . .  ,XK  from  the  distribution  with 
density  i(x) 

•  Estimate  I  using 

N 

I  =  •  (3.2) 

i=l 
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As  indicated  in  Section  2. 3,  the  sa tuple  variance  for  this  estimate  is  given 

by 

®2  *  fri£  ■'I'2  =  iri  r-2  g2(xi)  "  l2  (3-3) 
1=1  L  ‘=i 

Now  suppose  the  sampling  was  not  from  f(x),  but  rather  from  a 
distribution  f*(x).  Then  it  is  clear  from  (3. 1)  that  I  may  be  expressed  as 


I  =  rSfitosVwdx  (3.4) 

f*(x) 

where  it  is  assumed  f*(x)  does  not  go  to  zero  when  g(x)f(x)  is  not  zero. 

Now,  if  a  sampling  procedure  were  set  up  which  selected  a  random 

sample  . from  f*(x),  then  the  new  estimator  for  I  would  be 

given  by 


g(X1)f(XJ) 


*  i  6V-. i/ 

h  -  rl-iW 

i=i  1 


(3.5) 


f(X.) 

Thus,  when  is  selected  from  f*(x),  the  sample  is  weighted  by 
in  the  final  result.  Also,  the  sample  variance  is  given  by 

,  .  Jl,  |g(X  )f(X  )  .  I2  ( .  ”  rg(x)f(x  )f  J 

s  -krr-h  =& sE-fVH r6) 

i=l  1  I  i*lL  1  J  I 


It  is  of  interest  to  consider  the  expected  value  of  the  square  of  the  difference 

A 

between  1^  and  I.  That  is, 
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2 


E[(Ij  - 1)‘]  =  E 


/ 1  t'  A 


(3.7) 


Now  It  is  seen  that  if 


f*(x)  =  (3.8) 

o 

then  E[(I  j  - 1)  ]  =  0,  a  desirable  situation.  But  this  implies  the  ridiculous 
condition  that  I  is  known.  (This  is  the  extreme  situation  indicated  previously 
in  that  if  the  answer  is  known,  a  sampling  scheme  can  be  developed  with  ex¬ 
pected  zero  variance.)  However,  (3. 8)  does  suggest  that  if  something  close 
to  the  form  can  be  conveniently  selected  for  f*(x)  then  a  large 

improvement  in  the  simulation  should  be  possible.  For  example,  consider 
Fig.  3. 1.  which  qualitatively  shows  f(x)  and  .  a  reasonable  sampling 

function  f*(x)  which  approximates  is  indicated.  f*(x)  is  called  the 

importance  sampling  function  since  it  tends  to  emphasize  the  areas  where  the 
expression  is  most  important.  f*(x)  could  be  something  as  simple 

and  easy  to  work  with  as  an  exponential  or  normal  distribution.  The  aim  of 
importance  sampling  can,  therefore,  be  to  concentrate  the  distribution  of 
sample  points  in  the  parts  of  the  interval  which  are  most  important.  This 
demonstrates  again  the  utilization  of  knowledge  of  the  process  to  accomplish 
variance  reduction.  It  is  desirable  of  course  that  f*(x)  be  easy  to  work  with 
(i.  e. ,  integrable)  which  iB  usually  a  conflicting  requirement  to  having  f*(x) 
as  close  to  as  possible. 


+Note  that  if  g(x)  ever  changes  sign,  a  zero  variance  sampling  function  is  not 
so  easily  obtained  since  i*(x)  must  be  non-negative  to  be  a  density  function. 
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Fig.  3. 1.  Illustration  of  the  Importance  Sampling  Concept. 


3. 1. 1. 2  Comparison  ot  importance  Sampling  with  Straightforward  Sampling 

Unless  carefully  implemented,  importance  sampling  has  the  potential 
of  giving  worse  results  than  crude  sampling.  This  can  be  seen  by  a  com¬ 
parison  of  the  expected  values  of  the  sampling  variances  in  the  two  situations. 
That  is,  from  (3. 3)  and  (3. 6), 

E[S2-S2]  =  E[S2]-E[S2]  =  Jg2(x)  [l  (3.9) 

There  is  no  assurance  (3.9)  wiWYfe positive.  Therefore,  in  selection  of 
f*(x),  a  worse  result  could  be  obtained  from  the  selection  of  f*(x)  over 
f(x)  as  the  sampling  distribution.  This  can  be  avoided,  however,  by  care¬ 
ful  selection  of  the  importance  function  f*(x). 


3. 1. 1. 3  Extensions  of  Importance  Sampling  Concepts 

One  extension  of  interest  in  variance  reduction  is  in  applications 
involving  two  or  more  variables.  To  see  bow  an  importance  function  can 
be  developed  in  the  general  situation  consider  the  integral 


I  -  J\g(x)f($dJ  =  f  f*(x>dx  (3.10) 

i  x  f*® 

Now,  if  a  random  sample  Jfcj, . . .  is  obtained  from  the  importance  func¬ 
tion  f*(x),  then  the  estimator  for  I  is 


^  hi  **<*,> 


(3. 11) 


The  sample  variance  is  of  the  srme  form  as  (3. 6). 
As  in  (3. 1),  consider 

121 


-  E 


1 A  , 

M  <*<*!> 


_  1  ( |»  /g(x)f(x)V 

'  *  r*Us) ; 


g(x)f(x)\  f*^d£ 


(3. 12) 


As  in  (3. 7),  the  "best”  (i. e. ,  when  E[(I^-Ir]  =  0)  importance  func¬ 
tion  to  select  is 


f*(!)  =  «!&!$> 


(3. 13) 


The  arguments  for  selecting  f*(x)  is,  therefore,  identical  to  those 
used  for  selection  of  f*(x).  However,  in  practice  it  is  generally  difficult  to 
develop  f*(x)  due  to  the  multidimensional  aspects.  An  alternate  approach 
is  to  try  to  select  some  sort  of  conditional  importance  function.  For  example, 
suppose  x  =  (x,  y) .  Then  an  importance  sampling  function  for  x,  say  f*(x) 
can  be  developed  as  follows: 
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I  =  J  g(x,y)f(x,y)dxdy  =  J  g(x,y)f(x)f(y  |x)dxdy 
x,  y  x,  y 


=  4  v^Wl*)-*dy  . 


(3. 14) 


Now,  if  X^, . . . ,X^  is  selected  from  f*(x)  and  Yj,...,Yn  selected 
from  f(y  Ix^f*^),  then  the  estimator  for  I  is 

1  A  «*..  W> 

=  TT  Zmt  - ,ITen! -  *  (3*  ^ 


i=l 


TV? 


The  sample  variance  in  this  case  is 


S2  = 


"  N-lZ-f 
1=1 


ll 


1 

N-l 


iE 

i=l 


N  rg(Xi,Y1)f(Xi) 


-I 


(3. 16) 


In  a  manner  similar  to  that  used  to  arrive  at  (3. 12)  it  is  easy  to  see  that 


E[(5ri)2]  =  -jj-  u 


g(x,  y)f  (x) 


,xfy)i 


x,y 

=  |  Ix  Iy  g2(x»  y)f(yl*)dydx  - 12| 


f*(x,y)dxdy  - 14 


(3.17) 


But 


E[g2(x,y)|xJ  =  J  g2(x,y)f(y|x)dy 
y 


(3. 18) 
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so  the  "best"  importance  function  is 


f»(x)  =  f(x)|Efg2(x,y)|xj} _ 

j{E[g2(x,y)|x]J  f(x)dx 


1/2 


which  reduces  (3. 17)  to  -jp  j  [J*jE[g2(x, y)|x]  J  f(x)dx]2  . 


(3. 19) 


In  the  general  multidimensional  case,  it  follows  that  the  importance  function 
for  x  should  be 


f*(x)  =  «*>  —  (3. 20) 

J,j,E[g2(x,y)|x]j  f(x)dx 

where  y  refers  to  all  the  random  variables  except  x. 

The  estimator  for  I  and  the  sample  variance  are  given  respectively 
by  expressions  similar  to  (3. 15)  and  (3. 16). 

The  selection  of  the  ”best”  importance  function  implies  of  course 
that  the  answer  being  sought  is  known.  Thus,  it  is  clear  that  the  arbitrary 
selection  of  the  best  importance  function  would  be  a  matter  of  luck.  How¬ 
ever,  an  understanding  of  the  above  formulations  can  lead  to  guidance  to 
selecting  an  importance  function.  For  example  consider  (3. 20).  In  this 
case  it  may  be  possible  to  obtain  an  estimate  for  E[g2(x,  y)|x]  by  perform¬ 
ing  a  simulation  for  fixed  values  of  x  and  selecting  an  approximate  form  for 
the  results.  This  and  many  other  variations  become  readily  evident  when 
serious  considerations  of  importance  sampling  are  undertaken.  General 
guidelines  for  achieving  such  benefits  are  outlined  in  Part  H  of  this  document. 
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1 1  «  V1.-1  w"*'* 


3. 1. 1. 4  General  Areas  of  Applicability  for  Importance  Sampling 

Application  of  the  importance  sampling  technique  can  be  very  useful 
in  simulations  which  are  attempting  to  estimate  very  low  probability  events. 

One  of  the  major  areas  to  which  this  method  has  been  applied  is  in  nuclear 
physics  in  calculating  probabilities  concerning  nuclear  particle  behavior. 
Examples  are  estimating  the  probability  of  penetrating  a  shield  or  barrier  or 
analysis  of  the  wandering  of  particles  within  nuclear  reactors.  Application 
of  these  techniques  can  also  prove  fruitful  in  problems  which  are  more  oriented 
towards  operations  research.  For  example,  in  vulnerability  studies  of  weapons, 
the  number  of  critical  hits  on  a  target  can  be  increased  by  reducing  the  circular 
error  probability  (CEP)  of  the  weapon  from  that  normally  expected.  Another 
application  is  in  queueing  problems  where  improvements  in  estimates  for  the 
waiting  time  can  be  achieved  by  increasing  the  arrival  rate  or  increasing  the 
service  time. 

The  effectiveness  of  importance  sampling  techniques  are,  of  course, 
directly  related  to  the  ability  to  select  good  importance  sampling  distributions. 
This,  in  turn,  is  related  to  what  might  be  called  a  priori  or  beforehand  knowl¬ 
edge  of  the  process  being  simulated.  In  essence,  if  the  answers  to  the  ques- 
lions  being  sought  are  uyproximatelv  or  qualitatively  known,  then  very  good 
importance  functions  can  be  determined.  In  less  favorable  situations,  the  use 
of  importance  sampling  might  involve  an  iterative  simulation  procedure.  For 
example,  results  from  an  initial  simulation  might  be  used  to  generate  impor¬ 
tance  sampling  distributions  in  a  second  simulation.  Such  iterations  could 
proceed  through  several  stages. 

It  is  also  worth  noting  that  in  importance  sampling,  as  is  the  case  for 
most  variance  reduction  procedures,  the  samples  obtained  from  the  resulting 
simulation  may  be  less  effective  for  estimating  certain  quantities  than  crude 
sampling.  Since  the  importance  functions  are  selected  to  increase  the  ef¬ 
fectiveness  of  estimating  specific  quantities  or  parameters,  the  estimation  of 
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other  parameters,  not  involved  in  this  selection,  can  be  greatly  impaired  by 
this  procedure. 

3.1.2  Russian  Roulette  and  Splitting^’  19>20>36) 

3. 1.2. 1  General  Concepts 

This  technique  can  be  very  effective  in  problems  which  are  charac¬ 
terized  by  a  series  of  events.  Examples  are  random  walk,  random  movement 
of  a  submarine  on  maneuvers,  subsystems  in  series,  etc. 

Generally,  simulation  of  a  series  process  of  this  type  can  be  structured 
such  that  during  the  simulation  it  can  be  examined  at  various  stages.  At  one 
or  more  of  these  stages  it  may  be  possible  to  establish  whether  or  not  the 
process  is  in  an  interesting  or  uninteresting  state.  (Interesting  means  likely 
to  contribute  to  the  desired  result.)  If  the  state  of  a  given  stage  is  not  of 
interest,  then  one  might  want  to  restrict  further  investigation;  that  is,  kill 
off  the  process  with  a  known  probability  (Russian  Roulette).  If,  however,  the 
process  is  in  an  interesting  state,  one  may  want  to  conduct  additional  investi¬ 
gations;  that  is,  increase  the  number  of  simulations  starting  from  that  de¬ 
sirable  situation  (splitting). 

This  technique  can  also  be  particularly  useful  for  simulations  involv¬ 
ing  a  large  number  of  discrete  situations.  For  example,  consider  a  queueing 
system  in  which  a  large  number  of  individuals  are  being  tracked.  Then  at  a 
certain  stage  in  the  problem,  one  of  these  individuals  can  be  selected  and 
removed  from  the  system  with  probability  p.  If  this  individual  is  not  re¬ 
moved  from  the  system  he  is  allowed  to  continue  through  the  system  with  a 
weight  (l-p)“l=  l/q.  This  can  be  repeated  with  more  individuals  (with  the 
same  or  different  values  of  p)  until  the  number  of  individuals  being  tracked 
is  reduced  to  a  desired  size. 

Conversely  the  number  of  individuals  being  tracked  in  the  system 
can  be  increased  by  splitting.  For  example,  suppose  an  individual 
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has  an  assigned  weight  w,  then  he  can  be  replaced  by  n  individuals 
each  having  a  weight  w'  =  w/n.  The  n  individuals  can  then  proceed  inde¬ 
pendently  through  the  system,  except  that  the  weight  assigned  at  the  splitting 
must  be  maintained. 

It  should  be  evident  from  the  above  descriptions  that  Russian  Roulette 
and  splitting  techniques  can  be  useful  when  simulating  events  of  low  proba¬ 
bility  and  thus  its  application  can  prove  beneficial  in  many  of  the  same  situa¬ 
tions  where  importance  sampling  may  be  indicated.  Indeed,  there  is  a  great 
resemblance  between  the  two  methods  in  that  both  force  the  simulation  into 
interesting  areas  by  modification  of  the  sampling  distributions.  The  differ¬ 
ence  between  the  two  is  the  method  of  choosing  the  important  areas.  Russian 
Roulette  and  splitting  is  an  ’’after-the-fact”  or  passive  approach  which  uses 
a  straightforward  simulation  but  limits  or  increases  the  sampling  as  a  func¬ 
tion  of  the  events  which  occur  during  the  simulation.  Importance  sampling, 
on  the  other  hand,  attempts  to  force  the  paths  into  the  more  interesting  areas 
by  a  prior  alteration  of  the  underlying  random  process. 

3. 1. 2. 2  Application  to  a  Two -Stage  Problem 

To  illustrate  some  of  the  more  fundaments/ aspects  of  Russian 
Roulette  and  splitting,  consider  the  two -stage  process  in  Fig.  3.2.  Let  X 
denote  the  random  observations  from  the  first  stage,  and  Y  denote  the  ob¬ 
servations  from  the  second.  Suppose  the  parameter  to  be  estimated  is 

I  =  E[g(x,y)]  (3.21) 

Crude  sampling  would  generate  pairs  of  values  X^Yjj. . .  ;XN>YN  and 
estimate  I  using 

N 

i  =  -w  (3.22) 

i=l 
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Suppose,  however,  it  can  be  determined  from  the  characteristics  of 
the  problem  that  certain  values  of  X  would  pTbfebly  lead  to  more  interesting 
results  than  others.  On  this  basis  then,  Russian  Roulette  and  splitting  would 
be  implemented  by  dividing  the  first  stage  into  the  following  two  mutually 
exclusive  sets  of  states: 

R. :  The  set  of  states  where  Russian  Roulette  is  used  and  the 
1  simulation  is  terminated  with  probability  p  =  1  -  q.  If  the 
simulation  continues,  the  estimated  parameter  is  weighted 
by  1/q. 

Rg:  The  set  of  states  where  splitting  is  employed  by  breaking  each 
simulation  reaching  a  point  in  R2  into  n  •  simulations  to  be  con¬ 
tinued  from  this  point  in  the  process.  The  weight  assigned  to 
each  new  simulation  is  1/n  of  the  weight  of  the  original  simulation. 


This  procedure  would  be  then  repeated  for  N  starting  situations  as  shown  in 
Fig.  3. 3.  It  is  clear  the  sampling  process  has  been  modified  and  thus  the 
estimator  must  be  adjusted  accordingly.  In  this  case  the  estimator  becomes: 


y,  g(Xi,Yi)  y,  y^gfXj.Yj), 

VR1  Xt<R2  j=l 


(3.23) 


It  can  easily  be  shown  that  I  is  an  unbiased  estimator  for  I. 


Estimation  of  the  sample  variance  in  this  case  is  easy  to  accomplish. 

Defining  Ij  (i.  e. ,  Ij  =  0,  g(X^,  Yj)/q,  or  g(Xi,Yj)  as  the  contribution  to  the 
estimator  from  history  i,  and  since  ” 

N 

i  -  lE'i  (3-24) 

i=l 


then  the  sample  variance  is  estimated  using 


(3.25) 
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Repeat  N  Times 


Fig.  3. 3.  Russian  Roulette  and  Splitting  for  a  Two-Stage  Problem 


I 


Alternately,  batching  as  described  in  Section  2. 4. 2  could  be  used,  although 
(3. 25)  is  recommended. 

3. 1. 2. 3  Application  to  a  Three-Stage  Problem 

Although  the  basic  concepts  of  Russian  Roulette  and  splitting  are  as 
simple  as  presented  above,  they  can  be  applied  to  rather  large  multi.itage 
problems.  To  illustrate  this,  consider  the  three-stage  problem  shown  in 
Fig.  3. 4  where  it  is  shown  within  the  context  of  a  crude  sampling.  Assume 
Russian  Roulette  and  splitting  is  applied  between  the  first  and  second  stages. 
The  procedure  may  be  accomplished  as  follows  also  Fig.  3. 5). 

1.  First  generate  a  value  for  x,  Xj.  If  X^cR^,  the  history  is  ter¬ 
minated  with  probability  p^  =  l-q^i.  e. ,  Russian  Roulette).  If  the  history 
is  killed,  there  is  no  contribution  to  the  estimator. 


2.  If  the  history  is  not  killed,  a  value  for  y,  Yj,  is  selected.  The  history 
now  has  a  weight  1/q y  IfY^R^,  \he  history  is  terminated  (Russian 
Roulette)  with  probability  p2  =  l-q^.  If  the  history  is  not  terminated  here,  a 
value  of  z,  Z.  is  generated.  The  weight  of  the  history  is  then  (q^)  1  and 
the  contribution  to  the  estimate  for  I  is 


f(Xj,  Yj,  Zj) 

V2 

3.  If  the  history  is  not  killed  on  X  (with  weight  1/q^)  and  Y^f  Rgg 
then  the  history  is  split  into  histories.  Next,  tig  values  for  Z; 

Z.  . . . ,  Zjn  are  generated  and  assigned  weights 

of  this  history  to  the  estimator  is 

n2 


r 

j=i 


g(X.,  Yt,  Z^) 
n2ql 
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Fig.  3. 5.  Russian  Roulette  and  Splitting  for  a  Three  -Stage  Process 


4.  If  X^cR12*  then  the  history  (with  weight  1)  is  split  into  n^  values 
Yj, . . . ,  Yn^  and  assigned  weights  1/n^. 

5.  Now,  each  Yj(weight  =  l/nj),  j  =  1, . . . , n^  is  considered  in  turn. 
If  YjCRgj,  the  history  is  killed  with  probability  p2  =  l-q2*  hilled, 
there  is  no  contribution  to  the  estimator.  If  the  history  is  not  killed  here 
a  value  for  Z,  say  Z y  is  selected.  The  weight  of  is  l/Cn^qg).  The 
contribution  to  the  estimator  in  this  situation  is  now  given  by 


E 

YjfR 


g(Xl,Yj,Zj) 

nlq2 


21 


6.  If  Y{  (weight  l/nt)  cR99,  this  history  is  split  into  n9  histories. 


^  A  g(Xt,YrZ 
"  "  “ln2 


''sL 


V*» k=1 


This  procedure  is  repeated  N  times  as  indicated  in  Fig.  3. 5.  For  each  Xj 
selected  then  the  contribution  to  the  estimator  for  X^cR^, 


A 

.  I*  - 


g(Xj,  Y|,  Zj)  ^ 

1  2j  qTqJ  +  2mJ 

v  -O  i  “  v  -n  i_4 


n« 


g(X1,Y1,Zj) 


J  Mi 

VR21  VR22*=1 


and  if  XjcR^ 


"2 


A  __  g(Xt,  Yj.zp  _  _S^ 

i  *  Zj  n.q2  Zj  Zj  ntn9 

™  Y,cR„  k=l  1  2 


YR21 


j  “22 


(3.26) 


(3.27) 
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Assuming  the  entire  process  is  repeated  N  times  (i.  e. , :  N  starting  values 
for  x,  X1,...,Xn  are  selected)  the  estimator  is 

ifKt,rtlzak  ** 


N 

i  -  1  V1?  =  1 
1  '  TrZ^  i  TT 

i=l 


IW|it|ti||  y'  ^Xi.Yi 

M  |  W  llflf  M  H  BjQj 


(X|f  Y„  Z^) 


lVhhIVr» 


VR22  1=1 


♦  £ 


X,‘R12 


_  ^g(Xi,Y^Zjk)  _  g(Xt» Yj,Zj)  I 


(3.28) 


The  procedure,  although  rather  complex  to  write  down  as  formal  ex¬ 
pressions  can  be  seen  to  be  rather  straightforward. 


As  in  the  two-stage  case,  the  best  estimation  to  use  for  the  sample 
variance  is 


(3.28) 


2 

S  can  then  be  used  to  compare  the  efficiency  of  Russian  Roulette  and  Splitting 
to  the  crude  sampling. 


3. 1. 2. 4  Weight  Standards  for  General  Application 


For  a  general  application  of  Russian  Roulette  and  splitting,  it  is  best 
to  introduce  the  concept  of  weight  standards.  Let  us  presume  that  the  problem 
has  been  broken  up  into  several  regions,  Rj,  Rg, . . . ,  R^.  (These  'regions'  do 
not  necessarily  denote  geometrical  volumes,  but  rather  ranges  of  the  random 
variables  that  describe  a  state  in  the  system  being  studied. )  For  each  region, 
there  will  be  a  high  weight,  wHi,  a  low  weight,  wLi,  and  an  average  weight, 
wAi.  Now,  whenever  a  history  enters  region  i,  the  current  weight,  w,  of 
the  history  is  compared  to  the  weight  standards  as  follows: 
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1.  If  w  <  wLi,  Russian  Roulette  is  implemented  as  follows: 

•  With  probability  1  -  the  history  will  be  killed. 

wAi 

•  With  probability  ,  the  history  will  survive  with  a  new 

wAi 

weight  of  (Note  that  the  expected  weight  surviving 

from  this  process  is  w,  which  it  must  be  to  conserve 
weights). 

2.  If  w  >  wHi,  splitting  is  implemented  as  follows: 

•  Find  n  such  that  w  -  .. 

•  Create  n  histories  which  start  from  this  point  with  a 
weight  wAi. 

w-nwAi 

•  With  probability  — - - ,  create  one  more  daughter 

wAi 

history  to  start  lroE\  this  point  with  a  weight  wAi»  (This 

procedure  conserves  the  expected  weight  while  making  all 
histories  start  from  this  point  with  the  same  weight,  w^. ) 

3.  If  wLi  <  w  <  Wj^.j,  do  nothing  to  the  history. 

The  underlying  assumption  in  the  above  procedure  is  that  each  region  de¬ 
scribes  a  volume  of  approximately  constant  importance.  The  importance 
varies  from  region  to  region  in  a  manner  inversely  proportional  to  the 
average  weight,  w^.  Thus,  histories  moving  into  a  region  of  higher  im¬ 
portance  (lower  weight)  will  be  split  while  those  moving  into  a  region  of 
lower  importance  (higher  weight)  will  suffer  Russian  Roulette.  For  maximum 
efficiency  in  allocating  computer  time,  all  histories  in  a  region  of  constant 
importance  should  have  the  same  weight.  The  use  of  a  fixed  average  weight 
standazd,  rather  than  fixed  splitting  parameter,  n,  or  fixed  Russian  Roulette 
probability,  p,  ensures  this  in  a  multiregion  setting. 
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The  high  and  low  weight  standards,  wH  and  wL,  are  only  used  to 
define  upper  and  lower  limits  for  triggering  the  Russian  Roulette  and 
splitting  processes.  If  Russian  Roulette  and  splitting  are  the  only  variance 
reduction  techniques  being  employed  and  the  history  weights  are  not  other¬ 
wise  being  varied,  it  is  probably  best  to  set  Wjj  =  w^  =  w^.  On  the  other 
hand,  if  there  are  other  techniques  in  use  which  are  changing  the  history 
weights,  it  is  best  to  put  a  spread  between  w^  and  wL  within  which  the 
weight  is  allowed  to  vary.  If  the  spxead  between  w^  and  wL  is  too  small, 
there  will  be  a  loss  o£  efficiency  due  to  computing  time  spent  in  the  book 
keeping  involved  with  frequent  Russian  Roulette  and  splitting  actions.  Con¬ 
versely,  if  the  spread  is  too  large,  there  will  be  a  loss  of  efficiency  as 
equal  amounts  of  computing  time  are  expended  on  histories  with  unequal 
weight. 

To  estimate  expected  values  and  variances,  the  contribution  from  a 
single  original  history  is  computed  using 

t,  =  £  gO^W^jj)  (3.30) 

J 

where  the  summation  runs  over  all  contributions  from  split  histories,  j. 
which  originated  from  the  same  initial  history,  i.  Then  the  final  estimate 
of  I  for  N  initial  histories  is 


i=l 


and  the  sample  variance  is  given  by  (3. 29). 


44 


3. 1. 2. 5  Selection  of  Criteria  for  Russian  Roulette  and  Splitting 

One  difficulty  in  the  application  of  Russian  Roulette  and  splitting  is  in 
the  selection  of  values  for  the  parameters  being  used,  either  weight  standards 
or  Russian  Roulette  kill  probability  and  number  of  splittings.  The  ideal  approach 
would  be  to  select  these  parameters  to  minimize  the  variance  in  the  estimate 
as  was  done  with  importance  sampling;  however,  this  is  generally  not  practical. 
Consequently,  intuitive  information  along  with  practical  limitations  (e.g. ,  com¬ 
puter  storage)  and  simplifications  must  be  resorted  to.  For  example,  if  it  is 
'felt'  that  a  certain  range  of  Y  is  twice  as  important  than  the  remainder  of 
the  range  of  Y,  then  a  splitting  with  n  =  2  of  histories  inside  the  important 
range  or  a  Russian  Roulette  kill  factor  of  .  5  outside  the  range  would  be  not 
unreasonable.  A  clue  to  the  optimum  standards  to  be  selected  is  given  by  the 
results  of  analysis  for  importance  sampling  (3. 20)  or  stratified  sampling  (see 
Section  3. 1. 4).  In  both  cases  the  resulting  weights  will  be  proportional  to 
E[g2(x)]  "1^2.  Thus  the  weight  standards  in  a  given  region  should  be  inversely 
proportional  to  the  root  mean  square  average  of  the  'pay-off*  or  result  function 
i.  e. ,  weight  standards  should  be  high  in  regions  of  low  value  and  low  in  regions 
of  high  value. 
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3.1.3  Systematic  Sampling*7’ 12’ 14, 20>  23> 24» 36) 

3. 1. 3. 1  General  Concept 

Systematic  sampling  is  a  procedure  that  modifies  selection  from  the 
sample  space  in  a  somewhat  structured  manner.  This  serves  to  reduce  the 
random  variation  that  is  introduced  into  the  results  when  crude  Monte  Carlo 
sampling  is  used.  An  important  characteristic  of  systematic  sampling  is  that 
if  used  it  will  always  result  in  a  reduction  in  variance  from  the  that  obtained  using 
crude  sampling.  Also,  the  method  rarely  involves  any  significant  effort  to 
implement.  Unfortunately,  the  improvement  is  generally  less  than  impressive 
although  as  a  general  rule  it  should  be  used  whenever  the  opportunity  arises. 

Its  potential  application  can  generally  be  associated  with  initial  or 
starting  conditions  in  a  problem.  For  example,  systematic  sampling  could 
be  applied  to  the  distribution  of  interarrival  times  of  individuals  entering  a 
queueing  system,  the  initial  position  of  a  submarine  in  simulation  of  an  ASW 
exercise,  etc.  Generally,  any  Monte  Carlo  problem  which  has  a  probability 
distribution  to  characterize  the  initial  conditions  can  be  considered  as  a  candi¬ 
date  for  application  of  systematic  sampling. 

Two  methods  commonly  used  for  systematic  sampling  will  be  described 
below.  As  will  be  seen,  systematic  sampling  is  similar  to  stratified  sampling 
to  be  described  next.  Stratified  sampling  can  be  considered  an  optional  form 
of  systematic  sampling. 

In  each  of  the  methods  to  be  presented  below,  the  usual  form  of  the 
integral, 


I 


g(x)f(x)dx 


will  be  considered. 


(3. 32) 
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3. 1. 3. 2  Method  I  for  Systematic  Sampling 

In  the  first  method  tor  applying  systematic  sampling,  assume  the 
range  of  the  density  function  f(x)  is  broken  up  into  N  equal  regions  each 
having  an  area  1/N  (N  should  typically  vary  between  5  and  50).  This 
scheme  is  shown  in  Fig.  3.  6  for  both  the  probability  density  f(x)  and  the 
cumulative  distribution  function  F(X)  =  i(x)dx 


Fig.  3.  6.  Interval  characterization  for  systematic  sampling 
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It  is  clear  that 


TT  -/,(x 

XcL, 


f(x)dx  ;  j  =  1,. . .  ,N 


(3.  33) 


Now,  assume  a  sequence  of  random  numbers,  R^, . . .  ,Rn  is  selected  from 
the  uniform  distribution  on  the  interval  (0, 1).  This  form  of  systematic 
sampling  will  then  generate  the  following  sequence  of  numbers 


p  -  V .(±1)  •  i  =  1, . . .  ,n 

Kij  "  IT  +  N  ’  j  = 


(3.  34) 


For  each  value  of  i,  this  procedure  effectively  assigns  a  value  of  R^  to 
each  interval  j. 


The  next  step  is  to  determine  X*j  from 


-L 


f(x)dx  ; 


.  i-l,.«.,n 


(3.35) 


The  estimator  for  I  is 


n  n  N 

-S-  Z/i  * 

i=l  i=l  j=l 


g(Xi|) 


where 


(3.  36) 


I.  -  K(Xjj)  ;  i  -  1, . . .  ,N 

j=l 


(3. 37) 


is  the  contribution  from  the  ith  batch  of  histories. 
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The  e ample  variance  is  computed  using 


(3. 38) 


3. 1. 3. 3  Method  n  for  Systematic  Sampling 

A  second  and  generally  better  method  to  perform  systematic  sampl¬ 
ing  is  to  allocate  N  independent  samples  to  each  interval  defined  in  Fig.  3. 6 
rather  than  scale  each  random  number  Rt  into  N  intervals.  This  can  be 
accomplished  by  selecting  R^;  i  *  1, . . . ,  n;  J  =  1, . . .  ,N  random  numbers  from 
a  uniform  distribution  U(0,1).  Then,  n  random  numbers  are  allocated  to 
each  of  the  N  Intervals  using 


i  * 

J  *  1,  •  • . ,  N 


The  values  of 


are  then  determined  from 


(3. 39) 


(3. 40) 


The  estimators  for  the  sample  mean  and  variance  in  this  case  are  given  by 
3. 36  and  3. 38  respectively. 

Of  the  two  methods  described  above,  the  second  will  always  gi/e  the 
better  answer  in  the  sense  of  smaller  variance.  However,  Method  n  re¬ 
quires  that  a  larger  number  of  random  samples  be  selected  from  U(0, 1). 
Generally  it  is  recommended  that  Method  n  be  used  in  spite  of  the  slightly 
additional  computation  effort  required.  In  both  cases,  the  efficiency  of 
systematic  sampling  compared  to  crude  sampling  is  approximately  propor¬ 
tional  to  N2. 
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3.1.4  Stratified  Sampling 


(1,  6t  7, 11, 12, 14, 15, 20, 24, 27, 32, 33) 


3. 1.4.1  General  Concept 

Stratified  sampling  (sometimes  called  quota  sampling)  is  similar  to 
systematic  sampling  with  additional  considerations  directed  toward  structur¬ 
ing  the  sampling  process  so  that  regions  of  large  variance  will  receive  more 
samples.  In  this  sense,  therefore,  stratified  sampling  seeks  to  combine 
the  systematic  and  importance  sampling  schemes.  Alternately,  stratified 
sampling  can  be  viewed  as  a  special  case  of  systematic  sampling  where  opti¬ 
mal  distribution  of  samples  is  attempted. 

Generally,  all  the  problem  characteristics  that  serve  to  define  the 
applicability  of  systematic  sampling  apply  to  stratified  sampling.  However, 
if  additional  information  on  which  portions  of  the  sampling  distribution  tend 
to  contribute  more  to  the  total  variance  is  available,  additional  reduction  in 
the  variance  can  be  achieved  using  stratified  sampling. 

Assume  the  sampling  range  for  f(x)  is  broken  up  into  N  regions  of 
length  Lj, . . .,L^.  In  this  case,  however,  assume  is  selected  accord¬ 


ing  to  some  specified  P.  where 


J 


■L 


f(x)dx 
XCLj 


j  =  1,. . .  ,N 


(3. 41) 


Schematically  this  structure  is  simttar  to  that  in  Fig.  3. 6.  In  fact,  if 
Pj  =  1/N,then  this  structuring  would  be  the  same  as  systematic  sampling. 

A  general  rule  to  follow  for  selecting  the  is  to  select  them  such  that  the 
variation  in  g(x)f(x)  is  the  same  in  each  of  the  intervals. 

Once  the  intervals  L^, . . ., are  selected,  the  next  requirement 
is  to  define  the  number  of  samples  to  assign  to  each  interval. 
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i 


More  specifically,  let  be  the  number  of  samples  assigned  to  inter¬ 


val  Lj  where , 


^  n j  =  n  (3.42) 

J=1 

The  n^  samples  can  be  assigned  to  Interval  as  follows: 

Select  from  U(0, 1).  Then,  X^cLj  are  determined 

hn  1 


ti  ft * 

RjjPj  -J  f(a)dx  j  i  s  1,  •  > . , D| 

An  unbiased  estimate  for  I  is 
N  p  h  1  N 

>-£i}i;-<V  -s-A 

j=l  *  i=l  j=l 


(3.43) 


(3. 44) 


where 


■£2><V 

1 7=1 

To  see  that  (3. 44)  is  unbiased  consider 


Efy  =  lj  =  j  g(x)dx 

Lj  ^ 


from  which  it  follows  that 
N 

1  -2>ili 
H 


(3.45) 


(3.46) 


(3. 47) 
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To  select  the  iy  consider 


y  *  VI 

N  )2' 

E[(i-D2]  =  E 

=  E 

E  -  V 

IV*  J\ 

j=i  )J 

N  p2  2 

-  Z  ¥*■  <MB> 

J=1  i 

where 


*  (  ^  few  - 1.]2* 

K j  1 


-  ij)2] 


(3.49) 


is  the  variance  in  the  interval  L.. 

Now,  if  the  are  selected  to  minimize  (3. 48)  subject  to  (3. 42),  then  it 
can  be  shown  (24)  that  n^  should  be  selected  to  satisfy 


(3.50) 


Thus,  the  sample  size  in  each  interval  should  be  selected  to  be  proportional 

to  the  fraction  of  the  variance  in  each  interval.  The  obvious  difficulty  is,  of 

2 

course,  that  the  Oj  are  not  known.  However,  they  can  be  estimated  using 

=  spE  tg(V  -V2  = 

i  i.i  j 

where  nj  samples  are  arbitrarily  selected  in  each  interval.  An  iterative 
scheme  can  be  structured  to  estimate  as  the  sampling  is  carried  out. 
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The  sample  variance  using  stratified  sampling  may  be  estimated  using 


S 


j=i  J  t=i  i=i  1 

or  a  batching  procedure  (see  Section  2. 4. 2)  could  be  used. 


n 


j 


]  i=l 


(3.  52) 


As  in  the  case  of  systematic  sampling,  the  efficiency  of  stratified 
sampling  in  comparison  with  crude  sampling  is  —  N^. 

3 . 2  ANALYTICAL  EQUIVALENCE  TECHNIQUES 


This  group  of  variance  reduction  techniques  is  based  on  using  prior 
knowledge  of  the  processes  involved  to  form  analytical  or  approximate  solu¬ 
tions  to  the  problem  being  simulated.  This  is  another  means  to  utilize  informa¬ 
tion  about  the  process  and  is  also  based  on  the  fact  that  it  is  generally  beneficial 
to  use  analytical  solutions  to  parts  of  the  problem  whenever  sufficient  prior 
knowledge  allows.  This  may  mean  that  a  related  process  is  solved  exactly 
using  analytical  or  other  low  variance  techniques  and  that  the  difference  be¬ 
tween  the  exact  and  related  processes  is  derived  by  Monte  Carlo  techniques. 

All  of  the  techniques  discussed  below  are  based  on  this  concept  and  many  are 
very  closely  related  in  the  principles  and  ideas  involved. 

3.2.1  Expected  Value*18, 19> 20, 35> 36) 

This  technique  is  based  on  the  fact  that  analytic  determinations  are 
usually  preferred  to  results  gained  through  simulation.  Thus  any  portion  of  a 
process  which  can  be  analytically  determined  should  be  replaced  by  its  analy¬ 
tical  representation  in  the  model  whenever  that  can  be  done  without  losing  an 
essential  element  from  the  simulation.  The  name  "expected  value"  refers  to 
the  basic  notion  that  Monte  Carlo  simulation  of  any  parameter  is  equivalent  to 
estimating  its  expected  value,  i.  e. ,  evaluating  an  integral.  Thus  any  portion 
of  the  simulation  which  can  be  evaluated  analytically  can  be  replaced  by  its  ex¬ 
pected  value,  and  this  is  likely  to  improve  the  efficiency  of  the  simulation. 
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To  demonstrate  the  application  of  the  expected  value  technique,  con¬ 
sider  the  two-dimensional  integration. 

I  =  f(x,  y)g(x,  y)dxdy  .  (3.53) 

This  could,  for  example,  be  a  two-stage  problem  such  as  that  described  under 
Russian  Roulette  and  splitting  (Section  3. 1. 2)  and  shown  schematically  in 
Fig.  3.7. 


Score 

g(x,y) 


Fig.  3. 7.  Crude  simulation  of  a  two-step  process 


where  first  a  random  sample  X  is  selected  from  the  density  function  f(x)  and 
then  a  random  sample  Y  is  selected  from  the  conditional  distribution  f(y|X). 
Now,  if  this  is  repeated  N  times,  the  crude  Monte  Carlo  estimator  for  I  is 

N 

i  *  -iJ-  £  g(xi’  Yi>  •  (3- 54) 

i=i 

Assume,  however,  that  it  is  possible  to  compute  the  expected  value  of  the  con¬ 
ditional  probability  in  the  second  stage  given  the  result  of  the  first  stage,  Xj. 
That  is,  suppose  E[g(y|x)]  Is  known  analytically. 

Then  the  simulation  could  be  performed  by  simply  generating  N  values 
for  X  ,  Xj, . . . ,  Xjj  and  using  the  expected  value  estimator  given  by 

N 

*e  =  ^  Ete(ylxi)]  <3*55> 

i=l 
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This  is  an  unbiased  estimator  for  I  since  Efl£]  *  I  . 
The  sample  variance  is  given  by 


N 


i°i 


.  i=l 


(3.  56) 


It  is  easy  to  show  that  this  approach  will  always  give  results  that  are  better 
than  the  straightforward  Monte  Carlo  procedure. 

The  trivial  nature  of  the  above  description  should  not  be  interpreted  as 
indicating  limited  potential  for  this  technique.  Indeed,  its  application  often 
results  in  a  difficult  simulation  becoming  mb  easy  one.  Furthermore,  it  can 
find  application  in  a  vast  number  of  problems.  For  example,  in  computing 
the  average  time  spent  in  a  queueing  system,  the  simulation  of  the  server(s) 
can  be  replaced  by  the  mean  service  time.  In  radiation  transport,  the 
stochastic  absorption  of  particles  is  almost  always  replaced  by  a  weighting 
system  involving  the  expected  absorption  percentage. 

It  is  not  always  possible,  of  course,  to  calculate  the  expected  value  of  a 
process  in  the  simulation  -  if  all  expected  values  could  be  calculated  analytically 
there  would  be  no  need  for  simulation.  Even  if  the  expected  value  can  be  cal¬ 
culated,  it  may  not  be  possible  to  replace  the  process  by  its  expected  value. 

The  entire  distribution  involved  in  the  process  may  be  important  in  the  simula¬ 
tion,  or  in  other  words,  the  second  and  higher  moments  may  be  important  to  the 
final  answer  and  not  just  the  first  moment  or  expected  value.  In  a  few  cases, 
replacing  the  stochastic  process  can  actually  reduce  the  efficiency.  This  may  be 
true  whenever  the  stochastic  process  is  one  of  the  decision  points  where  the 
simulation  may  be  terminated.  Replacing  the  termination  decision  by  its  ex¬ 
pected  value  involves  assigning  a  weight  to  the  history  and  modifying  that  weight 
to  allow  for  the  expected  percentage  of  terminations  at  each  decision  point. 

When  the  survival  probability  is  small,  this  can  lead  to  computing  time  being 
wasted  in  simulating  a  history  which  may  have  a  vanishingly  small  v eight  after 
passing  a  few  decision  points. 
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Several  significant  aspects  must  be  considered  before  the  expected 
value  techniques  can  be  implemented.  Generally,  these  are: 

1.  Identify  those  parts  of  the  overall  simulation  for  which  the 
expected  value  can  be  determined  efficiently. 

2.  For  each  such  process  identified  in  1. ,  a  determination  must 
be  made  as  to  whether  the  random  nature  of  the  process  is  an 
essential  element  of  the  overall  simulation  or  whether  it  may 
be  replaced  by  a  deterministic  process  without  loss  of  desired 
realism  in  the  model,  i.  e. ,  does  the  fact  that  the  stochastic 
process  results  in  a  range  of  outcomes  rather  than  a  single 
expected  value  affect  the  final  answers  of  the  simulation?  Or, 
put  in  different  terms,  replacing  the  random  process  by  its 
expected  value  preserves  the  first  moment  of  the  distribution 
but  alters  all  the  higher  order  moments.  If  these  higher  order 
moments  are  important  to  the  overall  answer  (e.  g. ,  in  deter¬ 
mining  a  probability  distribution)  then  the  stochastic  process 
cannot  be  replaced  by  its  expected  value.  On  the  other  hand, 

if  the  higher  order  moments  do  not  contribute  to  the  final  result, 
then  replacement  by  the  expected  value  can  be  considered.  For 
a  particular  physical  system,  the  determination  of  which  stochastic 
elements  are  essential  may  depend  on  the  particular  parameters 
being  estimated. 

3.  Finally,  it  must  be  determined  if  the  replacement  of  the  random 
process  by  its  expected  value  will  )27£rease  the  efficiency.  This 
is  generally  true,  but  not  always.  If  the  process  in  question  is 

a  branch  point  where  the  history  may  go  in  either  of  two  (or  more) 
directions,  then  replacing  the  stochastic  event  by  its  expected 
value  requires  splitting  the  history  with  each  part  going  in  one  of 
the  directions  and  carrying  the  probability  at  that  branch  as  a 
weight.  Should  enough  of  these  events  be  encountered  the  number 
at  tiplit  histories  which  must  be  computed  can  easily  expand  be¬ 
yond  a  reasonable  bound.  Alternatively,  one  of  the  branches  of 
the  decision  Cftff  be  to  terminate  the  history;  in  this  case  the  his¬ 
tory  is  not  split  but  continues  from  the  branch  point  with  a  weight 
representing  the  survival  probability.  This  can  easily  lead  to 
histories  with  very  low  weights  which  usually  represents  a  loss 
in  efficiency  in  the  calculation.  Again,  this  determination  is 
likely  to  depend  on  the  particular  parameters  of  interest  in  the 
calculation. 
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Once  the  decision  has  been  made  to  replace  the  stochastic 
process  by  its  expected  value,  the  implementation  depends  on 
the  role  of  the  process  in  the  overall  simulation.  Specifically, 


1.  If  the  process  is  one  of  selection  of  a  random  variable, 
then  the  process  becomes  merely  a  deterministic  setting 
of  the  variable  to  its  expected  value  and  the  simulation 
proceeds  as  before  with  no  change  in  estimators. 

2.  If  the  process  represents  a  decision  between  terminating 
or  not  terminating  the  hiBtory,  then  the  history  continues 
but  with  a  reduced  weight  representing  the  probability  of 
survival.  That  is, 


wnew  *  wold  -  p8 


(S.  57) 


where  p  is  the  probability  of  survival  (non-termination)  at 
the  decision  point  and  wold  and  wnew  are  the  weights  of 
the  history  before  and  after  the  replaced  random  process. 

For  any  parameter  being  calculated,  an  estimate  for  each 
history  can  be  made  by  summing  the  contributions  from 
that  history.  That  is. 


li  ~  £  wij  g^ij) 

j 


(3.  58) 


where  is  the  weight  of  the  ith  history  at  the  time  of  the 
jth  contribution  to  the  final  result.  Then  the  final  estimate 
and  the  sample  variance  are  given  by 

I  =  n-E  <*■  59) 

1=1 


(3.  60) 
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If  the  contributions  to  a  parameter  from  a  history  would 
have  come  from  the  terminations  in  the  process  which  was 
replaced  by  its  expected  value,  then  the  loss  of  weight  at 
each  such  step  is  the  proper  estimate  for  the  expected 
terminations.  In  this  case  we  set 

!i  ■  £  <wold,  1)  '  wnew,  ij>  ’  *<V  ■  £  wold,  ij(1*ps)‘  *<V 


where  j  denotes  the  Jth  occurrence  of  the  replaced  event 
in  the  ith  history.  The  estimators  for  I  and  S2  remain 
as  in  (3. 59)  and  (3. 60)  above. 

3.  If  the  process  represents  a  decision  between  two  or  more 
branch  points,  then  the  history  must  be  split  and  followed 
from  that  point  on  as  two  separate  histories,  each  talcing  a 
different  branch  and  carrying  a  weight  equal  to  the  proba¬ 
bility  of  that  branch.  Parameters  are  estimated  by  summing 
weighted  contributions  from  all  daughter  histories  resulting 
from  an  original  parent  history,  using  formulas  identical  to 
(3. 58),  (3.  59),  and  (3. 60). 

In  cases  2  and  3  above,  histories  may  develop  weights  which  are 
very  small.  As  this  may  entail  spending  a  good  deal  of  computing  time 
calculating  histories  that  can  make  only  a  trivial  contribution  to  the  result, 
the  efficiency  may  be  very  low.  To  remedy  this,  Russian  Roulette  (see 
Section  3. 1.2)  can  be  used  to  eliminate  those  histories  whose  weights  be¬ 
come  too  small. 

Figure  3. 9  shows  a  schematic  flow  of  a  multistage  simulation  when  a 
branch  process  that  is  a  possible  termination  point  for  a  history  is  replaced 
by  its  expected  value.  This  may  be  contrasted  to  Fig.  3. 8  which  shows  the 
crude  Monte  Carlo  approach  to  the  same  simulation  and  Fig.  3. 10  which 
shows  statistical  estimation  used  on  the  same  problem. 
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Fig.  3, 10.  Multistage  Simulation  with  Statistical  Estimation 


3.2.2  Statistical  Estimation*12, 18>  19, 20> 34> 35> 36' 

It  is  not  essential,  and  frequently  not  efficient,  for  a  simulation  of 
a  physical  process  to  be  carried  out  to  the  natural  termination  of  the  process 
in  estimating  final  outcomes.  It  is  always  proper  to  stop  the  simulation  at 
any  point  and  to  calculate  through  analytic  or  numerical  means  the  expecta¬ 
tion  of  reaching  any  final  outcome.  Indeed,  the  sooner  the  simulation  is 
stopped  and  the  more  analytic  calculations  are  done,  the  lower  the  variance 
will  be.  Obviously,  however,  the  sooner  the  simulation  is  stopped  the  more 
complex  and  difficult  the  analytic  calculations  become  and  the  point  is  quickly 
reached  wher'  the  overall  efficiency  is  less  despite  the  gain  in  variance  re¬ 
duction.  At  the  last  step  in  the  simulated  process,  the  probability  of  reaching 
the  various  final  outcomes  needs  to  be  determined  in  order  to  do  the  simula¬ 
tion.  Thus,  it  is  generally  advantageous  to  use  analytic  expectations  for  the 
final  step.  Whether  the  analytic  calculations  should  be  carried  beyond  the 
final  step  will  depend  on  the  particular  process  and  results  desired,  but 
general .v  it  is  less  efficient  to  use  analytic  expectations  for  more  than  the 
last  step. 

If  the  process  being  simulated  is  a  once-through  process,  i.  e. ,  the 
final  step  can  be  reached  only  once  each  history,  then  the  use  of  ejected 
outcomes  is  equivalent  to  the  expected  value  technique.  If  the  process  is 
iterative  or  repetitive  with  many  passes  through  a  branch  point  where  a  final 
outcome  is  possible,  there  are  two  ways  of  using  the  analytic  computations. 
One  is  by  die  expected  value  technique  as  outlined  in  the  previous  section. 

The  other  is  called  statistical  estimation  and  should  be  used  whenever  the 
expected  value  technique  would  be  inefficient.  In  statistical  estimation  the 
stochastic  process  is  not  removed  from  the  simulation,  but  the  expected  value, 
rather  than  the  result  of  the  simulation,  is  then  used  in  the  estimation. 

Consider  a  simulation  consisting  of  many  repetitive  steps  in  which 
one  step  is  a  random  choice  between  arriving  at  some  final  outcome,  , 
or  continuing  through  the  simulation  process  with  some  other  value  of  y  . 


60 


Let  the  probability  of  at  this  step  be  P(Yf|  X)  where  X  denotes  all  the 
other  random  variables  determined  at  earlier  steps  in  the  process.  In  crude 
Monte  Carlo,  a  random  number,  R  ,  would  be  generated  at  this  step,  and 
if  R  <  P(Yj|  X)  ,  then  the  history  would  be  terminated  with  a  score  of  1.  If 
R  >  P(Yf  1 5?)  ,  the  history  would  continue  with  no  score  being  made.  After 
N  histories  the  estimate  for  the  probability  of  reaching  Y{  would  be 

Pc(Yf)  =  j5  (3.62) 


where  n  is  the  number  of  histories  which  terminated  at  Yf  .  In  statistical 
estimation,  no  change  is  made  in  the  simulation  process,  i.  e. ,  a  random 
number,  R  ,  is  drawn  and  tested  to  see  if  the  history  continues  or  is  termin¬ 
ated.  However,  the  estimation  or  scoring  technique  is  changed.  Every  time 
the  particular  step  is  encountered,  a  contribution  of  P(Yf  | )?)  is  added  to  the 
estimate,  regardless  of  what  the  actual  outcome  of  the  simulation  was.  Then 
the  final  estimate  is  given  by 


PSE(Yf* 


-if  ZW 


(3.63) 


i=l  j 


where  the  j  summation  runs  over  all  occu ranees  of  the  (possibly-)  final 
step  in  the  course  of  the  i^1  simulation.  An  estimate  of  the  variance  may 
be  calculated  from 


N 


s2  ■  ntt  E  (*t  -  PsE(Yf>)2  ■  m 

i=l 


N 

i  y  p2 

N  *i 
i=l 


PSE(Yf^ 


(3.64) 


where 


P  = 

i 


(3.65) 


.th 


is  the  estimate  resulting  from  the  i  history. 
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The  schematic  flow  for  the  statistical  estimation  technique  in  a  multi¬ 
stage  process  is  shown  in  Fig.  3.10  where  it  can  be  contrasted  with  the  use 
of  crude  Monte  Carlo  (Fig.  3.8)  and  expected  value  technique  (Fig.  3.9)  for 
the  same  process. 

If  the  calculations  do  not  get  too  complicated,  the  statistical  estimation 
procedure  can  be  extended  to  using  the  probability  that  the  simulation  will 
reach  the  desired  end  in  one  or  two  more  stages.  If  the  analytic  calculations 
of  such  expected  values  are  difficult  computationally,  then  statistical  estima¬ 
tion  may  be  less  efficient  than  crude  estimators.  In  employing  statistical 
estimation,  the  actual  simulations  which  reach  the  desired  end  point  must 
be  neglected  to  avoid  double  counting  and  only  the  'statistically  estimated' 
results  used. 


The  use  of  statistical  estimation  will  always  improve  the  variance 
but  it  can  be  particularly  useful  if  the  probability  of  reaching  the  desired 
end  point  is  small  at  all  intermediate  stages.  It  becomes  not  just  useful  but 
essential  when  the  probability  of  the  end  point  becomes  vanishingly  small. 

In  such  a  case  no  actual  simulations  would  reach  the  desired  end  point  and 
the  crude  Monte  Carlo  estimator  would  give  a  zero  result.  If  there  were 
many  intermediate  stages  which  could,  with  very  low  probability,  reach  the 
desired  end  point,  then  statistical  estimation  might  calculate  the  desired 
result  with  good  accuracy. 


3.2.3  Correlated  Sampling 


(8,9, 12, 14, 16, 18, 19, 20, 34, 36) 


3 . 2 . 3 . 1  General  Concept 


Correlated  sampling  can  be  one  of  the  most  powerful  variance  reduc¬ 
tion  techniques  due  to  the  wide  applicability  of  the  technique  as  well  as  to  the 
large  efficiency  gains  which  can  be  obtained.  Frequently  the  primary  objec¬ 
tive  of  a  simulation  study  is  to  determine  the  effect  of  a  small  change  in  the 
system.  A  crude  sampling  approach  would  make  two  independent  runs,  with 
and  without  the  change  in  the  system  being  modeled,  and  subtract  the  results 
obtained.  Unfortunately  the  difference  being  calculated  is  often  small  compared 
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to  the  two  separate  results  while  the  variance  of  the  difference  will  be  the 
sum  of  the  variances  in  the  two  runs.  Thus  the  relative  uncertainty  in  the 
difference  is  generally  very  large  and  it  can  easily  happen  that  the  effect 
being  calculated  is  smaller  than  its  statistical  uncertainty.  In  such  cases 
the  use  of  correlated  sampling  can  be  essential  to  obtaining  a  statistically 
significant  result.  If,  instead  of  being  independent,  the  two  simulations 
use  the  same  random  numbers  at  comparable  stages  in  the  computation,  the 
results  can  be  highly  correlated.  The  effect  of  this  correlation  is  to  reduce 
the  variance  of  the  difference  in  the  two  results  while  not  changing  the  vari¬ 
ance  in  either  individual  result.  As  a  consequence  the  effect  of  the  difference 
in  the  system  will  be  known  to  a  much  greater  accuracy  than  it  would  be 
otherwise.  Another  way  of  viewing  correlated  sampling  through  random 
number  control  is  to  realize  that  the  use  of  the  same  random  numbers  will 
generate  identical  histories  in  those  parts  of  the  two  systems  which  are  the 
same.  Thus  any  difference  in  the  results  will  be  due  directly  to  the  differ¬ 
ence  in  the  systems  and  not  to  random  noise  from  the  unchanged,  but  sto¬ 
chastic,  elements  in  the  rest  of  the  stimulation.  This  obviously  leads  to 
a  gain  in  efficiency  compared  to  the  uncorrelated  case. 

There  are  several  types  of  situations  where  the  use  of  correlated 
sampling  is  indicated.  These  include: 

•  The  effect  of  a  small  change  in  the  system  is  to  be  calculated. 

•  The  difference  in  a  parameter  in  two  or  more  similar  cases 
is  of  more  interest  than  its  absolute  value  in  any  one  case. 

•  A  parametric  study  of  several  problems  is  to  be  performed. 

This  has  greatest  potential  when  the  problems  are  relatively 
similar  in  nature. 

•  The  answer  to  one  of  several  similar  problems  is  known  accur¬ 
ately.  The  answers  to  the  unknown  problems  can  often  be  esti¬ 
mated  from  the  known  result. 
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3. 2. 3. 2  Analytical  Formulation 


To  provide  insight  into  tie  concept  of  correlated  sampling,  consider 
the  following  integrals  which  characterize  different  (but  hopefully  similar 
or  related)  problems: 

Ij  =  J’f1(x)g1(x)dx 

(3.66) 

and 

l2  =  Jf2^^2^)dy 

(3.67) 

of  primary  interest  is  the  difference 

A  =  X1  "  l2  * 

(3.68) 

The  obvious  crude  approach  is  to  select  N  values  of  X  from  fj(x), 

say  Xj,...,XN  and  N  values  of  Y  from  fg(y),  say  Yj,...,Yn  and 
compute 

N  N  N 

*  ■  h  -h  ■  sE  W  -  *2'Yi»  ■  V  L«i(xi>  -  sZ^i' 


The  variance  in  A  is 

<72(A)  =  Ojdj)  +  -  2  cov  (i^ig) 

(3.70) 

where 

"ft)  =  E[(5j  -  Ij)2] 

(3.71) 

-  e[«2  -  V2] 

(3.72) 
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and 


ccvflj.Ij)  =  E[(5j  -Ij)  (i2  - 12>]  =  E[(i1i23  -  Ijl2  (3.73) 

A  A 

Now  if  and  Ig  are  statistically  independent  (i.  e.,no  correlation)  then 
cov  (Ijjlg)  =  0  (3.74) 


and 


o2(A)  =  -MT2(i2)  (3.75) 

A  A 

However,  if  the  random  variables  L  and  Ig  are  positively  correlated  then 

cov(iri2)  *  0  (3.76) 

and  the  variance  in  the  correlated  case  will  be  less  than  that  realized  with 
no  correlation. 


3 . 2 . 3 . 3  Implementation  of  Correlated  Sampling 


The  key  to  reducing  the  variance  of  the  estimate  of  A  in  (3.69)  is 
to  ensure  positive  correlation  between  the  estimators  1^  and  Ig.  This 
can  be  achieved  in  several  ways  although  the  easiest  to  implement  is  to  ob¬ 
tain  correlated  samples  through  random  number  control.  Specifically,  this 
can  be  accomplished  by  using  as  many  of  the  same  random  numbers  as  pos¬ 
sible  in  paired  situations  in  the  two  simulations.  One  way  this  might  be 
accomplished  is  by  using  the  same  sequence  of  pseudo-random  numbers  in 
the  two  simulations.  For  example,  in  the  above  problem  the  same  sequence 
of  uniform  random  numbers,  Rj, . . . , from  U(0, 1)  could  be  used  to 
generate  the  two  sequences  Xj,  ...,XN  and  Yj,...,Yn  by  using 


fj(x)dx 


f2(y)dy 


(3.77) 
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Clearly  the  random  variables  Xj  and  Yi  are  positively  correlated  since 
they  both  used  the  same  Rj  .  In  fact,  if  fj  is  very  similar  to  f2  ,  the 
random  sequences  will  be  very  highly  correlated. 

As  another  example,  consider  a  multistage  problem  where  many  of 
the  events  which  occur  at  various  stages  are  not  subject  to  the  differences 
in  problem  structure.  Then,  identical  random  numbers  should  be  used  at 
those  stages  which  are  not  impacted  by  the  problem  differences  to  produce 
some  positive  correlation  between  the  two  simulations  and  to  eliminate  sta¬ 
tistical  noise  from  parts  of  the  system  which  are  unchanged. 

It  is  difficult  to  be  specific  as  to  how  random  number  control  should 
be  applied  in  a  general  problem.  As  a  general  rule,  however,  to  achieve 
the  maximum  correlation,  the  same  random  numbers  should  be  used  whenever 
the  similarities  in  problem  structure  will  permit  this  to  occur. 

Use  of  the  same  sequence  of  random  numbers  in  two  separate  runs  means 
that  the  histories  generated  will  be  identical  up  to  the  point  where  the  difference 
in  the  system  first  comes  into  play.  This  complete  correlation  will  obviously 
eliminate  all  variance  in  the  difference  due  to  the  first,  common  part  of  the 
simulation.  In  addition,  it  is  possible  to  save  computational  time  by  doing  the 
first  simulation  and  storing  the  knowledge  of  the  state  of  the  system  at  the 
first  point  in  the  history  where  the  difference  in  the  two  systems  affects  the 
simulation.  The  second  simulation  could  then  start  at  this  point  rather  than 
recomputing  the  identical  first  part  of  the  history.  However,  this  frequently 
requires  more  programming  effort  to  implement  than  is  justified. 

If  it  is  possible  to  return  to  the  same  sequence  of  random  choices  after 
the  calculations  concerned  with  the  perturbation,  then  obviously  all  the  variance 
in  the  simulation  will  be  associated  with  the  perturbation,  with  maximum  ef¬ 
fectiveness.  However,  this  is  generally  not  possible.  Usually  the  perturba¬ 
tion  forces  a  difference  in  decision  and  the  two  histories  proceed  in  divergent 
directions  following  the  perturbation.  At  the  completion  of  one  history  and 
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the  start  of  the  next,  it  is  then  necessary  to  re -synchronize  the  random  num¬ 
ber  sequences  to  begin  the  next  histories  identically. 

In  order  to  estimate  the  variance  in  A  obtained  through  the  use  of 
correlation,  it  is  necessary  only  to  view  A  as  if  it  were  being  directly  simu¬ 
lated  and  to  calculate  the  sample  variance  of  the  difference  as 


i=l 


where 

\t  =  gjCty  -  g2(Yt)  (3.79) 

3.2.4  History  Reanalysis^8^ 

3. 2. 4. 1  General  Concept 

History  reanalysis  is  essentially  a  form  of  correlated  sampling  except 
that  one  does  not  actually  run  a  second  simulation  using  the  same  random 
numbers  as  in  the  first.  Instead,  the  detailed  results  of  the  first  simulation 
are  reanalyzed  to  calculate  an  answer  for  the  second  process.  In  this  case 
die  first  process  is  treated  as  an  altered  or  biased'  modification  of  the  second 
process.  In  addition  to  the  reduced  variance  obtained  by  the  correlation, 
history  reanalysis  reduces  the  computational  time  involved  by  not  actually 
performing  the  second  simulation.  This  can  often  lead  to  quite  high  effici¬ 
encies  for  this  technique. 

Since  history  reanalysis  is  a  form  of  correlated  sampling,  it  will 
apply  to  the  same  types  of  problems  indicated  in  Section  3. 2.3. 1.  However, 
there  is  an  additional  constraint  that  the  differences  in  the  systems  being 
simulated  must  be  expressible  as  a  difference  in  a  probability  distribution 
or  in  the  scoring  function. 
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3. 2. 4. 2  Analytical  Formulation 

For  the  purposes  here,  it  is  assumed  that  there  are  two  problems 
of  interest  which  involve  estimating  Ij  and  Ig  as  given  by  3.66  and  3.67. 

It  is  assumed  that  a  random  sample  Xj, . . .  ,3^  has  been  obtained  from 
f  j(x).  The  estimator  for  Ij  is  as  usual 

N 

*i  =  s  £  gi(xi>  (3-80) 

i=l 


Since 


„  g9(x)f„(x) 

12  -  Jf2(x)g2(x)dx  =  Jf^x)-^^ — dx 

an  estimate  for  I2  can  be  obtained  using 


N 

h-h  £ 

i=l 


s2(xiW 


(3. 81) 


(3.82) 


where  f^Xj)  ±  0  is  implied  whenever  ggfX^f^X.)  ^  0.  This  is  of  course 
very  reminiscent  of  the  formulas  for  importance  sampling  (see  Eq.  3. 5). 


The  sample  variance  for  I, 

is 

2 

o2  N 

S2  =  Wl  ' 

N 

hy 

IN  Lj 

i=i 

WW 

-l2 

j 

W 

which  may  be  used  in  efficiency  calculations.  However,  to  properly  calculate 
the  effect  of  the  correlation,  it  is  necessary  to  estimate  the  variance  of  the 
difference  directly.  That  is,  if 


g2(x.)f2(xi) 

*i - "gl(xi) 

is  the  difference  in  the  i^  history  and 


(3. 84) 
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(3.85) 


N 


is  the  average  difference,  then  the  sample  variance  is 


rN 

• 

N  .  1 

s2  -  1 

s  -  FT1 

Xft  -  *>2 

N 

=  m 

1 

R 

i 

CM 

<4 

W 

„i=l 

- 

i=l  J 

Alternatively,  batching  (Section  2. 4.  2)  can  be  applied  to  the  differences. 

3. 2. 4. 3  Further  Considerations 

The  equations  in  the  preceding  section  show  that  by  simulating  and 
biasing  the  results  appropriately,  an  estimate  for  I ^  can  be  readily  generated. 
This  can  obviously  be  generalized  to  the  case  of  three  or  more  similar 
problems.  The  time  saving  gained  by  not  making  several  separate  simulations 
is  obvious.  In  addition,  there  will  be  all  the  advantages  of  high  correlation  due 
to  the  use  of  a  common  set  of  random  numbers. 

However,  the  use  of  history  reanalysis  is  not  universally  beneficial 

and  may  sometimes  be  less  efficient  than  independent  simulations.  The  simu- 

larity  of  the  equations  for  history  reanalysis  to  those  of  importance  sampling 

has  already  been  noted.  It  should  then  be  clear  that  the  random  sample 

. . .  ,XN  which  has  been  chosen  from  f^(x)  is  not  likely  to  be  the  optimum 

choice,  in  the  sense  of  'importance',  for  the  simulation  of  ggMfgte).  Thus, 

the  variance  of  1 2  is  likely  to  be  greater  than  that  which  would  be  obtained 

from  a  direct  simulation  of  Ig.  Hopefully,  the  gain  in  efficiency  effected  by 

the  correlation  and  reduced  computation  will  more  than  offset  this  variance 

increase  but  this  will  not  be  true  in  all  cases.  Obviously  the  more  similar 

the  two  cases  are,  the  more  optimum  the  selection  will  be  for  computing  !„. 

6 

Thus,  history  reanalysis  works  best  for  the  problems  which  are  most  similar 
which  are  the  cases  where  variance  reduction  is  most  necessary. 
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There  is  an  important  class  of  problems  where  history  reanalysis  is 
trivially  accomplished.  This  occurs  when 

!gj(x)f  j(x)  ,  in  some  region  A 

(3.87) 

0  ,  elsewhere 

An  example  of  this  is  a  simulation  that  is  run  for  a  fixed  real-time  interval, 

Tj,  and  it  is  desired  to  know  the  results  of  a  case  that  was  limited  to  a  shorter 
time  interval,  Tg.  Then  history  reanalysis  consists  of  making  a  single  simu¬ 
lation  with  the  longer  time  limit,  T^,  scoring  for  the  first  case  all  events,  and 
scoring  for  the  second  case  only  those  events  for  which  time  is  less  than  T2. 

Several  extensions  readily  come  to  mind.  Most  significantly,  parame¬ 
tric  studies  to  determine  the  impact  of  several  forms  of  a  sampling  distribution 
can  be  readily  performed.  This  capability  is  often  overlooked  in  simulation 
studies  resulting  in  considerable  unnecessary  expense. 

3. 2. 5  Control  Variates111’ 14»  20»  24>  34»  36) 

3. 2. 5. 1  General  Concept 

In  many  situations  where  analytic  models  are  difficult  or  impossible 
to  develop,  there  exist  simplications  or  approximations  to  the  problem  having 
analytic  or  closed  form  solutions.  In  these  situations,  the  analytic  information 
can  be  beneficially  exploited  to  reduce  variance  by  what  is  referred  to  as  con¬ 
trol  variates.  With  this  technique,  instead  of  estimating  a  parameter  directly, 
the  difference  between  the  problem  of  interest  and  some  analytical  model  is 
simulated.  The  variance  reduction,  or  increase  in  accuracy  in  estimating  the 
parameters  of  interest,  is  directly  related  to  the  degree  of  correlation  between 
the  analytic  and  the  true  process.  Application  of  this  technique  is  again  very 
general  and  should  prove  very  useful  when  analytical  representations  of  simpli¬ 
fied  models  for  the  system  exist. 
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The  control  variate  method  has  several  of  the  features  similar  to  those 
of  the  correlation  technique  and  indeed  in  some  instances  is  addressed  within 
the  context  of  correlation.  However,  the  manner  in  which  this  technique  is 
applied  is  somewhat  distinctive  and,  therefore,  will  be  treated  separately  here. 

3. 2. 5. 2  Analytical  Formulation 
Again  consider  the  integral 


I 


g(x)f(x)dx 


(3.  88) 


Assume  that  it  is  possible  to  determine  a  function  h(x)  whose  expected 
value  is  known  (or  analytically  determinable)  and  which  closely  approximates 
g(x).  Qualitatively  such  a  situation  is  shown  in  Fig.  3. 11.  Let 


+® 

h(x)f(x)dx 


(3. 89) 


and  assume  that  6  is  known 
Then  I  can  be  expressed  as 


I  - 


h(x)f(x)dx  + 


£  [g(x)  -  h(x)  Jf(x)dx 


[g(x)  -  h(x)]f(x)dx  =  0  +  Ix 


(3. 90) 


The  function  h(x)  is  called  the  control  variate  for  g(x)  and  may  be  some 
approximation  (or  guess)  to  g(x). 
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Now,  since  h(x)  has  been  selected  so  that  the  first  integration  can 
be  completed,  simulation  is  required  only  on  the  second  term, 


•/' 


[g{x)  -  h(x)Jf(x)dx 


(3. 91) 


If  crude  sampling  is  used  to  simulate  Ij,  then  a  random  sample  would 
be  obtained  by  selecting  Xj, . . .  ,X^  from  f(x)  and  using 


N 


N 

1=0  +  1/n]T  g(Xj)  -  i/n£  h(Xt)  =  0  +  l/N^  ^ 
i=l  i=l  i=l 


N 


(3. 92) 
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S 


where 


=  g(Xi)  -hCXj)  (3.93) 

An  estimate  for  the  sample  variance  for  purposes  of  efficiency  calculations  is 
given  by 


where 

N 

A  =  1/N  A. 

i=l 


(3.94) 


(3.  95) 


The  use  of  control  variates  is  but  another  manifestation  of  the  use 
of  information  about  the  problem  to  reduce  the  variance.  In  this  case  a 
knowledge  of  the  approximate  behavior  of  the  system  was  used  to  advantage. 
Its  effectiveness  is  greatly  dependent,  however,  on  how  good  h(x)  can  be 
selected  to  approximate  g(x). 

It  is  worthwhile  to  note  that  if  an  approximate  shape  for  g(x)  is  not 
known,  it  is  often  possible  to  obtain  an  approximation  by  simply  selecting  a 
few  values  of  x  and  plotting  the  results.  A  straight  line  fit  to  the  results  or 
some  other  simple  formulation  may  significantly  improve  the  efficiency  of  the 
simulation. 

3.2.6  Antithetic  Variates^9,  U> 12>  14, 28>  34’  36) 

3. 2. 6. 1  General  Concept 

The  concept  of  antithetic  variates  is  somewhat  related  to  that  for  con¬ 
trol  variates  except  that,  rather  than  seeking  a  function  that  is  similar  to  the 
function  being  estimated,  a  function  is  sought  which  is  negatively  correlated. 
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i 
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The  estimation  process  is  then  structured  to  exploit  this  negative  correlation 
to  reduce  the  variance  in  the  estimator.  The  basic  idea  can  be  used  to  develop 
very  sophisticated  and  powerful  methods.  Two  methods  will  be  presented  below. 

3. 2. 6. 2  Method  I  for  Antithetic  Variates 

The  use  of  antithetic  variates  can  be  introduced  very  simply  as  follows: 
consider  again  the  parameter  I  to  be  estimated  where 

^oo 

I  =  f  g(x)f(x)dx  (3. 96) 

•/— 03 


I 


t 


Assume  an  unbiased  estimator  1^  for  I  exists.  For  example,  if 
crude  sampling  is  used 

N 

*’i  '  1/N  £  g(xi)-  (3-97) 

i=i 

A 

Suppose  a  second  unbiased  estimator,  Ig  for  I  also  exists. 

A 

A  third  unbiased  estimator  0  for  I  can  be  constructed  using 
0  =  1/2(I1  +  I2)  (3.98) 

and 

E[0]  =  I 

The  variance  in  the  estimator  6  is  given  by 

o2(0)  =  1/4  a2^)  +  1/4  <r2(I2)  +  1/2  cov  (1^)  (3.  99) 

A  A 

Now,  if  Ij  and  Ig  are  selected  such  that  they  are  negatively  correlated, 
then 

cov  ^  0  (3. 100) 
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i 


(3. 101) 


If  covfl^Ig)  is  sufficiently  large  (negatively),  then 

2,z 


a  (6)  <  a  (ij) 


and 


a2(0)  <o2fi2) 


(3. 102) 


Thus,  the  combined  estimator  9  of  ^  and  Ig  will  have  a  smaller  variance 

A  A 

than  either  1^  or 

A 

The  estimator,  I0,  is  called  the  antithetic  variate  since  it  is  an 

“  A 

estimator  that  compensates  for  the  variation  in  Ij.  This  is,  of  course,  the 
concept  of  negative  correlation. 

There  is  a  convenient  manner  in  which  an  antithetic  variate  can  be 
obtained.  This  is  as  follows: 

Consider  the  estimator  to  be  derived  from  crude  sampling.  To 
accomplish  this  a  set  of  random  numbers  R^, . . . ,  will  be  generated  from 
U(0, 1)  and  the  corresponding  values  of  X,  say  X^, . . .  ,XN  can  be  obtained 
from 

Xi 

R.  =  f  f(x)dx  ;  i  =  1, . . .  ,N  (3.103) 

It  is  clear  that  {X.}  are  from  the  distribution  f(x).  Now  consider  genera¬ 
tion  (f  another  set  of  values  of  X,  X^, . . .  ,X^  using 

X’ 

1  -  R.  -  f  f(x)dx  ;  i  =  1, . . .  ,N  (3. 104) 

J  —CD 


Again  X^,...,X^  will  be  from  the  distribution  f(x).  The  pairs  of  values 
of  X.  and  Xj  are,  of  course,  correlated  since  the  same  random  number 


75 


R.  was  used  to  generate  both  values  of  X.  Furthermore,  these  values  of 
Xj  and  XJ1  are  neglatively  correlated.  That  i$  small  X^  corresponds 
to  large  Xj.  This  is  shown  conceptually  in  Fig.  3. 12. 

Defining 

=  1/2^)  +  g(Xj)]  (3. 105) 

Then  the  estimator  for  I  using  the  antithetic  variates  is 
N  N 

8  =  1/n£  a  =  1/2n£  [g(x()  +  g(X') ]  (3. 108) 

i=l  1=1 


X  —  1 


Fig.  3. 12  Schematic  Showing  a  Method  to  Generate  Antithetic  Variates 
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The  sample  variance  is  determined  from 


3.  2. 6. 3  Method  II  for  Antithetic  Variates 

A  second  approach  to  antithetic  variates  that  has  proven  very  success¬ 
ful  is  to  use  a  combination  of  stratified  sampling  along  with  antithetic  vari¬ 
ates.  Consder  a  case  with  2  strata  as  shown  in  Fig.  3. 13.  Assume  the  range 


Fig,  3. 13  Method  n  for  Application  of  Antithetic  Variates 


of  f(x)  is  broken  up  by  Xa  into  the  ranges  -co<x<Xa  and  Xa<x<«o.  Now, 
suppose  a  random  number  is  selected  from  U(0,  lh  Then  select  X^  from 


(3. 108) 


and  select  Xj  from 


a +  (1-0)^  = 


(3. 109) 


Clearly  Xj  and  X.’  are  distributed  according  to  f(x)  within  their  appropriate 
ranges.  Also,  X.  and  Xj  are  negatively  correlated  since  small  X^  implies 
large  X{  and  vice  versa.  Now  define 

=  ag(Xi)  +  (l-a)g(Xp  (3.110) 


An  unbiased  estimator  for  I  is 


N  N 

0=  l/N^Sj  =  l/N^fotfXj)  +  (l-odg(X’)]  (3.111) 

i=l  i-1 
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and  the  sample  variance  is 


i=l 


(3.112) 


If  a  =  1/2,  then  Eq.  3.  Ill  reduces  to  Eq.  3.106. 

The  difficulty  in  the  use  of  this  second  approach  is  in  the  selection  of 
a.  A  general  rule  is  to  select  a  such  that 

g(xa)  =  «g(XL)  +  (l-ot)g(Xu>  (3. 113) 


where  and  XL  are  the  upper  and  lower  limits  of  the  range  of  f(x). 

An  alternate  approach  is  to  utilize  a  trial  and  error  method  to  test  various 
values  of  a  and  estimate  the  improvement  realized  in  the  efficiency. 

It  is  important  to  recognize  that  the  choice  of  a  will  not  impact  the 
simulation  in  the  sense  that  the  estimator  will  still  be  unbiased.  However,  it 
may  result  in  some  loss  of  efficiency  if  a  poor  value  is  selected. 

3.2.7  Regression*7,11,14* 

3. 2. 7. 1  General  Concepts 

Regression  techniques  have  found  limited  application  in  Monte  Carlo 
simulation  in  spite  of  the  seemingly  important  advantages  that 

•  They  can  be  applied  to  a  wide  variety  of  Monte  Carlo  simulations 

•  They  will  produce  unbiased  estimates 
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•  They  can  be  applied  in  a  situation  where  correlation  is  known 
to  exist  and  will  take  advantage  of  such  correlation 

e  If  applied  to  a  situation  where  no  correlation  exists,  nothing  is 
lost  except  the  additional  computational  effort  involved 

Its  use  appears  to  be  rather  limited  due  to  the  effort  involved  in 
formulation  of  the  appropriate  estimators  and  the  difficulty  encountered 
when  attempts  are  made  to  view  a  practical  simulation  problem  within  the 
context  of  known  formulations  of  the  regression  method. 

3. 2. 7. 2  Analytical  Formulation 


To  formalize  the  regression  method,  assume  a.  set  of  integrals 
Ij,...,lp  are  to  be  estimated.  Assume  a  set  of  estimates  9^, . . . ,6n  (n^p) 
are  available  satisfying  the  condition  that 

E[0j]  -  +  • .  •  +  ^jp^p  >  i  =  1 . n  (3. 114) 


where  the  matrix 


(3.115) 


is  known.  It  is  assumed  that  a  sample  is  available  consisting  of  N  indepen¬ 
dent  sets  of  simulated  values  for  0^ ,  namely  0^, . . .  0Nj  ;  j  =  1, . . . ,  n. 
Then 


!  j  ®  !»•••» n 


«J.  116) 
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and  the  column  matrix 


can  be  readily  constructed. 


(3. 117) 


Now,  an  estimate  for  the  matrix  f  is  desired  where  t  is  defined  as 


It  will  be  recalled  from  elementary  statistics  that  the  minimum  variance 
unbiased  estimator  for  f  is  given  by 


T  *+-1  -1  -*T  -*  -1  a 
1  V  1  A)  1  A  T  V  1  Q 


(3.119) 


where 


(3. 120) 


is  the  transpose  of  A 
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That  is 


Vij  =  E[|  0J  -  E(8t)}{6j  -  E(8j)|]  i  =  1, ...,n 

J  =  1»  •  •  • » n 

Unfortunately  is  usually  not  known.  However,  an  estimate  for  V, 
be  obtained  using 

N 

*eki  “  V  “  V  5  i  =  1, . . . ,  n 

k=l  j  =  1, . . . ,  n. 


where  ^  ;  i  =  1, . . . ,  n  are  obtained  from  Eq.  3. 116. 
and 


The  new  estimator  is  therefore 


This  is  still  unbiased  since 


E[I*]  =  I 


It  is  recommended  that  batching  be  used  to  obtain  an  estimate  for  the 
2 

variance  o  . 


(3. 121) 

can 

(3. 122) 

(3. 123) 

(3.124) 

(3.125) 
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As  it  is  formulated  above,  the  regression  technique  is  very  easy  to 
apply.  All  that  is  required  is  to  obtain  $  and  J  from  the  sample  values 
and  use  Eq.  3. 124  to  obtain  an  unbiased  estimator  for  I. 

In  spite  of  its  relatively  simple  formulation  which  is  based  on  some 
elementary  statistical  concepts,  the  method  is  difficult  to  apply  in  practice 
primarily  because  it  is  generally  difficult  to  formulate  the  estimators 

A  A 

0J, . . . , 0  .  It  has  evidently  been  applied  in  only  trivial  situations  and  reali¬ 
zation  of  its  full  potential  must  await  additional  development  and  experience. 
Clearly,  one  characteristic  a  problem  should  have  before  attempting  to  apply 
this  method  is  a  linear  combination  of  the  estimator  and  parameters  to  be 
estimated  as  indicated  by  Eq.  3. 114. 

3. 3  SPECIALIZED  TECHNIQUES 

In  the  foregoing  sections  several  very  useful  and  well  developed 
Monte  Carlo  techniques  were  presented  and  discussed.  There  are,  however, 
a  large  number  of  additional  procedures  that  might  warrant  consideration  in 
situations  where  some  of  the  preceding  techniques  proved  ineffective.  These 
are  either  not  well  developed  (e.g. ,  orthonormal)  or  they  may  be  extremely 
specialized  and  have  therefore  found  application  in  very  specific  problems 
(e.g. ,  the  adjoint  method).  It  must  be  recognized, however,  that  the  application 
of  these  specialized  techniques  may  bo  necessary  to  achieve  a  reasonable 
answer  in  very  difficult  problems  but  should  be  resorted  to  after  this  becomes 
abundantly  clear.  These  specialized  techniques  are  however  fertile  fields 
for  further  research  into  variance  reduction. 


3.3.1  Sequential  Sampling^**’  ^ 

Occasionally  there  is  little  or  no  a  priori  information  concerning  the 
expected  results  of  the  simulation  or  perhaps  what  knowledge  there  is  strictly 


qualitative  with  no  quantitative  values  on  which  to  base  a  choice  of  an  impor¬ 
tance  function  or  Russian  Roulette  or  splitting  standards.  In  such  a  case  it 
may  be  possible  to  use  sequential  sampling.  This  is  not  a  specific  variance 
reduction  technique  but  rather  a  general  approach  to  the  use  of  other  techniques. 
In  sequential  sampling  an  initial  run  is  made  with  little  or  no  variance  reduc¬ 
tion  used.  Then  the  results  of  this  first  run  are  analyzed  to  calculate  an 
importance  function  or  used  to  estimate  Russian  Roulette  standards,  strati¬ 
fied  sampling  parameters,  etc.  A  second  run  is  made  using  a  variance  re¬ 
duction  technique  with  the  parameters  estimated  in  the  first  run.  Now  these 
results  can  be  analyzed  in  conjunction  with  the  first  set  of  histories,  to  improve 
the  estimation  of  the  sampling  parameters.  A  third  run  can  then  be  made  using 
the  improved  sampling  parameters  and  this  'self-learning'  process  can  be 
carried  out  through  an  indefinite  number  of  stages  with  the  efficiency  of  the 
sampling  improving  at  each  stage.  Despite  the  simplicity  and  intuitive  appeal 
of  such  an  approach,  little  or  no  work  on  sequential  sampling  has  been  per¬ 
formed.  (There  has  been  some  development  of  'self-learning'  techniques  applied 
to  stratified  sampling,  and  preliminary  work  is  in  progress  in  some  other 
areas. )  Consequently  little  can  be  said  regarding  implementation  techniques, 
trade-offs  of  computation  required  to  estimate  sampling  parameters  versus  the 
efficiency  gain  from  improved  sampling,  or  possible  pit -falls  (e.g. ,  can  an 
initial  choice  of  'underbiased'  or  'overbiased'  parameters  lead  to  estimation 
of  parameters  that  are  even  more  underbiased  or  overbiased  with  the  sequen¬ 
tial  process  feeding  on  itself  destructively?) 

3.3.2  Orthonormal  Functions*14' 19»  25» 30>  34* 

The  use  of  orthonormal  functions  in  general  Monte  Carlo  simulations 
has  received  little  attention,  although  it  does  have  potential  for  greatly  im¬ 
proving  simulation  efficiency  when  it  is  applied  to  problems  having  a  large 
number  of  dimensions. 
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Basically,  the  approach  is  to  first  define  a  set  of  orthogonal  functions 
over  a  region  of  multiple  integration.  Next ,  a  sampling  scheme  must  be 
structured  that  will  permit  efficient  sampling  over  this  region  from  a  joint 
probability  density  function.  In  general  the  procedures  to  accomplish  these 
tasks  are  not  well  developed  and  will  not  be  further  discussed  here.  This  does 
not  imply,  however,  that  the  potential  gains  that  can  be  realized  with  this 
technique  are  not  worth  the  effort  but  only  that  no  general  guidelines  or  problem 
approach  can  be  presented  to  provide  reasonable  assurance  that  the  effort 
would  be  fruitful. 

3.3.3  Adjoint  Method*15, 17 * 21) 

In  formulating  the  mathematical  equations  for  the  simulation  of  a 
process,  it  frequently  is  the  case  that  there  is  another  set  of  equations,  "in¬ 
verted"  or  "adjoint"  with  respect  to  the  first,  that  is  mathematically  equivalent 
in  the  sense  that  a  solution  to  one  set  of  equations  will  also  give  a  solution  to 
the  second  set.  This  second  set,  the  adjoint  equations,  may  not  represent  any 
real  process  but  can  be  simulated  anyway.  Depending  on  the  nature  of  the 
problem  and  the  result  being  calculated,  it  may  be  easier  or  more  efficient 
to  simulate  the  adjoint  equations  than  to  simulate  the  direct  process.  It  may 
also  be  possible  to  split  the  problem  into  two  parts,  one  of  which  is  best  simu¬ 
lated  directly  and  one  of  which  is  best  simulated  by  the  adjoint  process. 

There  is  a  close  interrelationship  between  the  adjoint  solution  and  the 
importance  of  sampling  in  the  direct  simulation.  This  leads  to  interesting 
possibilities  such  as  using  approximate  methods  or  a  simplified  process  to  cal¬ 
culate  the  adjoint,  then  using  the  adjoint  solution  to  generate  importance 
sampling  for  a  full  simulation  of  the  direct  equations.  Another  alternative 
is  a  form  of  sequential  calculation  where  direct  and  adjoint  solutions  alternate 
with  each  solution  serving  as  the  importance  function  for  sampling  the  next 
solution. 


As  with  many  of  the  more  powerful  variance  reduction  techniques, 
the  adjoint  has  been  exploited  very  successfully  in  the  area  of  radiation  trans¬ 
port.  This  was  possible  due  to  the  formulation  of  the  radiation  transport 
problem  in  a  precise  (although  difficult  to  solve)  linear  integral  equation  where 
an  adjoint  formulation  could  be  easily  established. 

Unfortunately,  in  most  Monte  Carlo  simulations  such  a  compact 
formulation  is  not  generally  available  and  furthermore  would  be  difficult  to 
develop.  However,  the  concept  of  the  adjoint  offers  some  intriguing  possibili¬ 
ties.  For  example,  rather  than  tracing  an  individual  history  through  the  sys¬ 
tem  in  a  natural  manner,  (i.  e. ,  from  start  to  finish)  it  is  possible  to  trace 
the  individual  from  a  final  exit  point  to  the  starting  position.  As  an  example, 
should  an  adjoint  formulation  be  developed  with  respect  to  antisubmarine  war¬ 
fare  application  it  would  not  simulate  in  a  forward  manner  to  determine  events 
that  result  in  a  submarine  kill,  but  would  rather  start  from  a  submarine  kill 
and  trace  backward  through  the  simulation  to  determine  what  sequence  of 
events  could  have  led  up  to  the  kill.  Many  other  applications  could  be  en¬ 
visioned  which  could  exploit  the  use  of  the  adjoint  cr  backward  formulation. 
This  technique  must  however  await  further  development  and  use  before  it  be¬ 
comes  a  generally  applicable  method  such  as  is  found  in  importance  sampling 
or  correlation. 

3.  3. 4  Transformations^*  34) 

The  transformation  method  is  essentially  a  special  form  of  importance 
sampling.  It  differs  from  other  types  of  importance  sampling  in  that  a  priori 
information  about  the  process  is  formulated  in  a  parametric,  closed-form 
representation  which  is  then  used  to  alter  the  sampling  procedure  by  the 
transformation.  For  example  if  an  approximate,  parametric  representation 
of  the  importance  function  is  known,  then  a  transformation  can  generate  an 
altered  process  where  the  important  areas  have  greater  probability  and  the 
unimportast  areas  have  low  probability. 


This  method  has  been  largely  employed  in  radiation  transport 
calculations  where  the  functions  of  interest  frequently  have  an  approxi¬ 
mately  exponential  form.  There  an  exponential  transform  generates  an 
altered  process  with  a  greatly  reduced  variance. 

3.3.5  Conditional  Monte  Carlo(2> 1Q> 14>  34) 

If  the  particular  problem  being  investigated  is  very  complex  in  that 
it  deals  with  a  complicated  sample  space  or  the  probability  density  function 
is  difficult  to  select  from,  it  may  be  possible  to  embed  the  given  sample 
space  in  a  much  larger  space  in  which  the  desired  density  function  appears 
as  a  conditional  probability.  The  larger  space  and  its  accompanying  density 
are  chosen  to  be  much  simpler  in  definition  although  they  involve  more 
variables.  Simulation  of  the  large  problem  can  be  much  simpler  than  the 
original  complex  problem,  and,  despite  the  added  computation  required  to 
calculate  the  conditional  probabilities,  the  gain  in  efficiency  can  be  quite 
high.  Furthermore,  the  added  degrees  of  freedom  gained  by  the  added 
variables  and  the  choice  of  a  space  and  density  function  in  which  to  embed 
the  original  problem  can  be  utilized  to  secure  additional  variance  reduction. 

Despite  the  potential  power  of  conditional  Monte  Carlo  for  solving 
complex  simulation  problems,  it  has  seen  very  little  use.  In  large  part  this 
is  due  to  the  creative  leap  needed  to  view  the  problem  in  a  larger  context  and 
to  design  the  larger  space  in  which  to  embed  the  problem.  In  addition,  while 
the  theoretical  basis  for  this  technique  has  been  developed,  very  little  in  the 
way  of  practical  examples  or  applications  has  been  produced,  and  the  method 
is  still  not  well  understood. 
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PART  n 


APPLICATION  OF  VARIANCE  REDUCTION  TECHNIQUES 


4.  SELECTION  OF  VARIANCE  REDUCTION  TECHNIQUES 

Unless  one  is  very  familiar  with  the  concepts  of  variance  reduction, 
the  selection  of  a  promising  approach  for  a  particular  problem  can  cause 
considerable  difficulty  due  to  the  large  number  of  possibilities  available. 

This  section  of  the  report  will  be  directed  toward  aiding  the  analyst  in  selec¬ 
tion  and  implementation  of  an  appropriate  variance  reduction  technique  or 
techniques.  This  is  accomplished  by  way  of  a  systematic  procedure  to: 

•  Define  the  problem  information  that  can  be  used  as  a  basis 
to  select  an  appropriate  technique  or  techniques. 

•  Select  the  specific  technique  or  techniques  that  should  be 
considered  for  a  given  problem. 

•  Provide  basic  guidelines  to  implement  the  selected  procedure. 

Each  of  these  aspects  are  described  in  Sections  4. 1  through  4. 3  respectively. 

There  are  several  approaches  to  use  the  information  of  this  part 
with  that  of  Part  I.  The  first,  and  probably  the  most  effective,  is  to  review 
briefly  the  material  in  Part  I  and  then  proceed  to  defining  the  available 
information  on  the  problem,  selecting  the  appropriate  technique  and  proceed¬ 
ing  to  its  implementation.  Alternately,  the  required  information  for  selection 
of  the  particular  variance  reduction  technique  could  first  be  defined  and  the 
procedure  selected  prior  to  reviewing  the  material  in  Part  I. 

4.  i  DEFINITION  OF  PROBLEM  INFORMATION 

The  usefulness  of  variance  reduction  techniques  is  ultimately  deter¬ 
mined  by  how  effectively  known  information  about  the  problem  is  utilized. 
Problem  definition  is  thus  of  paramount  importance.  Before  considering 
variance  reduction  techniques,  it  is  essential  to  characterize  the  aspects  of 
the  problem  that  might  indicate  which  techniques  could  be  fruitfully  applied.  To 
evaluate  the  usefulness  of  these  methods  for  a  particular  problem,  it  is  helpful 
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to  know  the  information  defined  in  Table  4. 1.  Such  information  is  not  strictly 
required  as  certain  approaches  (such  as  sequential  sampling)  can  generate 
useful  information,  but  this  is  not  generally  accomplished  without  cost.  Thus 
prior  information  is  highly  desirable.  The  more  that  is  known,  the  better 
the  ultimate  results  will  be.  This  information  will  be  used  in  conjunction  with 
the  characteristics  of  the  techniques  described  in  the  next  section. 

Consider  item  1  in  Table  4. 1.  Here  it  is  required  to  clearly  define 
which  parameters  are  to  be  estimated.  This  could  include  mean  values,  vari¬ 
ances  or  probabilities.  Furthermore,  if  it  is  known  that  the  problem  is  such 
that  sensitivity  or  perturbation  studies  are  to  be  performed,  it  is  important  that 
this  be  recognized  at  the  outset.  Additional  information  of  significance  in¬ 
cludes  the  sequential  nature  of  a  problem,  as  well  as  identifying  any  input 
conditions  that  are  random  variables. 

Under  the  second  item  in  Table  4. 1,  the  significance  of  integral 
formulations  for  parameters  to  be  estimated  is  pointed  out.  The  importance 
of  integral  formulations  cannot  be  over -emphasized  since  it  is  this  basic  in¬ 
tegral  structure  which  is  used  to  understand  almost  every  variance  reduction 
method.  In  addition  to  the  analytical  forms,  the  knowledge  of  various 
expected  values  in  the  problem  and  the  availability  of  simplified  analyt¬ 
ical  expressions  which  are  positively  or  negatively  correlated  with  the 
parameters  whose  expected  values  are  being  estimated  can  provide  key 
information  as  to  the  variance  reduction  approach  to  be  finally  implemented. 

Next,  identifying  intermediate  events  or  parameters  which  assume 
significance  relative  to  their  importance,  unimportance  or  insensitivity  to 
the  problem  outcomes  can  provide  valuable  information.  A  key  ingredient  for 
improving  the  efficiency  of  multistage  simulations  is  identification  of  variables 
or  outcomes  in  the  problem  that  will  probably  lead  to  either  important  or  un¬ 
important  outcomes  of  the  final  events.  Finally,  identifying  those  final  outcomes 
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TABLE  4. 1 


Recommended  Problem  Information  to  be  Defined  Prior  to  Selecting 
and  Implementing  Variance  Reduction  Techniques 

1.  Define  nature  of  the  problem  relative  to 

•  expected  values  (means,  variances,  probabilities,  etc.)  to  be 
estimated 

•  sensitivities,  perturbations  or  variations  of  parameters  of  interest 

•  possible  mathematical  formulations  (e.  g. ,  integral  equations, 
expected  values,  etc.) 

•  any  sequential  characteristics  such  as  independent  paths,  outcomes 
dependent  on  intermediate  steps,  etc. 

•  input  conditions  which  are  random  variables  to  be  sampled 

2.  Identify  portions  of  the  problem  or  parameters  to  be  estimated  that  can  be 

•  expressed  in  an  analytical  form  su<:h  as  single  or  multidimensional 
integrals,  differential  and/or  integral  equations 

•  solved  analytically,  such  as  expected  values,  variances,  probabilities, 
etc. 

•  represented  by  approximate,  simplified  positively  correlated  analy- 
tical  expressions 

•  represented  by  approximate,  simplified  negatively  correlated 
analytical  representations 

•  established  as  relatively  not  important  to  final  outcomes  compared 
to  other  aspects  of  the  problem 

3.  Identify  variables  in  the  problem  which 

•  are  very  important  to  the  expected  outcome 

•  are  not  expected  to  significantly  impact  the  results 

•  over  their  range  of  variation  have  relatively  little  effect  on  the 
problem 

•  are  strongly  correlated  with  other  variables 

4.  Locate  final  events  or  outcomes  of  the  problem  which 

•  have  very  small  probabilities 

•  have  very  large  probabilities 

•  have  outcomes  relatively  insensitive  to  problem  parameters 

•  have  known  probabilities  of  occurrence  from  intermediate  stages 
in  the  problem 

•  are  linear  combinations  of  other  events  or  random  variables 

•  have  known  correlation  with  other  events  or  outcomes 
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which  occur  with  large  or  small  probabilities,  are  insensitive  to  problem 
parameters,  have  correlation  with  other  events,  or  the  final  events  which 
have  known  probabilities  of  occurrence  from  intermediate  stages  will  also 
prove  to  be  very  useful  in  effectively  reducing  the  variance. 

It  should  also  be  recognized  that,  in  general,  variance  reduction  tech¬ 
niques  are  aimed  at  reducing  the  variance  of  only  one  parameter  or  aspect 
of  the  process  being  simulated.  Using  variance  reduction  techniques  designed 
for  one  parameter  will  usually  reduce  the  effectiveness  of  the  simulation  to 
estimate  other  parameters.  It  is  very  important,  therefore,  to  determine  all 
of  the  results  which  will  be  desired  from  the  simulation  before  searching  for 
a  technique  to  apply  to  a  given  situation.  If  several  quantities  are  to  be  esti¬ 
mated  by  the  simulation,  the  selection  of  a  variance  reduction  technique  has 
to  be  considered  from  the  standpoint  of  all  of  these  parameters.  In  many  cir¬ 
cumstances  it  may  be  beneficial  (or  even  necessary)  to  implement  a  different 
variance  reduction  technique  for  each  parameter.  This  might  be  accomplished 
in  the  extreme  case  by  developing  a  different  simulator  for  each  parameter  of 
interest. 

4.  2  SELECTION  OF  VARIANCE  REDUCTION  TECHNIQUE (S) 

A  comprehensive  summary  of  the  variance  reduction  techniques  considered 
in  Section  3  is  shown  in  Table  4.  2.  Here,  each  alternative  is  described  briefly 
along  with  the  suggested  criteria  for  application.  In  addition,  advantages,  dis¬ 
advantages,  and  typical  applications  are  noted.  As  will  be  seen  many  of  these 
techniques  are  interrelated,  although  their  method  of  application  may  differ 
substantially. 

Also  shown  for  each  technique  is  the  section  numbers  of  this  report  in 
which  details  of  the  approach  can  be  found.  The  first  section  noted  refers  to 
the  material  in  Part  I  and  the  second  references  Part  n.  As  may  be  seen 
from  a  brief  review  of  Table  4. 2,  there  is  substantial  variation  in  the  criteria 
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TABLE  4.2 

Summary  of  Variance  Reductjon  Techniques  and  Characteristics 


TABTjE  4. 2  (Continued) 
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tirlty  studies 


TABLE  4. 2  (Continued) 
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to  be  used  for  selecting  various  techniques.  This  indicates,  of  course,  the 
importance  of  problem  definition  and  the  value  of  known  information  prior  to 
selecting  an  approach. 

The  results  of  the  information  requirements  defined  as  noted  in 
Table  4. 1  can  readi?y  be  used  in  conjunction  with  Table  4. 2  to  define  a  recom¬ 
mended  variance  reduction  approach.  For  example,  if  at  a  certain  stage  in 
the  problem  it  is  known  that  a  certain  range  of  variables  would  not  be  of 
interest  to  the  final  outcomes  relative  to  a  second  range  of  the  variables,  the 
application  of  importance  sampling  or  Russian  Roulette  and  splitting  is  sug¬ 
gested  by  Table  4. 2.  The  next  step  would  be  to  proceed  to  the  sections 
indicated. 

A  list  of  references  which  describe  one  or  more  of  the  various  aspects 
of  each  cf  these  techniques  is  included  in  the  corresponding  section  indicated 
in  Part  I. 

4.  3  PROCEDURES  FOR  IMPLEMENTATION  OF  THE  SELECTED  VARIANCE 
REDUCTION  TECHNIQUES 

This  section  presents  general  guidelines  to  implement  the  more  im¬ 
portant  variance  reduction  techniques.  For  convenience,  the  order  in  which 
the  methods  were  presented  in  Part  I  will  be  followed  here.  It  is  recommended 
that  the  material  presented  here  be  used  in  conjunction  with  that  presented  in 
the  corresponding  section  of  Part  I.  Specifically,  the  implementation  guide¬ 
lines  are  presented  in  the  following  subsections: 


4.3.1 

Importance  Sampling 

4.3.2 

Russian  Roulette  and  Splitting 

4.3.3 

Systematic  Sampling 

4.3.4 

Stratified  Sampling 

4.3.5 

Expected  Value 

4.3.6 

Statistical  Estimation 

4.3.7 

Correlated  Sampling 

4.3.8 

History  Reanalysis 

4.3.9 

Control  Variates 

4.3.10 

Antithetic  Variates 

4.3. 11 

Regression 
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No  procedures  are  presented  for  implementation  of  the  specialized  techniques 
(sequential  sampling,  orthonormal  functions,  adjoint  method,  transformations 
and  conditional  Monte  Carlo)  presented  in  Section  3. 4  since  these  are  not  con¬ 
sidered  to  be  well  enough  developed  for  general  application. 

It  should  be  mentioned  that  the  material  presented  here  is  intended  as 
a  basic  guide  to  provide  general  procedures  for  implementation  of  the  variance 
reduction  technique  selected  from  Table  4. 2.  In  many  cases,  it  is  difficult 
to  provide  anything  more  than  a  rather  general  description  of  the  steps  to  be 
implemented.  However,  where  possible  specific  formulae  or  recipes  vrtrich 
h?ve  general  applicability  were  included.  The  specific  analytical  formulae 
of  interest  are  also  summarized  in  Appendix  A. 

4.3.1  Importance  Sampling 

Importance  sampling  is  the  term  for  modifying  the  sampling  procedure 
in  a  manner  ‘hat  will  tend  to  emphasize  the  more  important  aspects  of  the 
problem.  The  results  must  be  corrected  to  account  for  this  modification. 

importance  sampling  is,  in  many  cases,  necessary  for  obtaining  a 
reasonable  answer  and,  in  other  cases,  can  give  outstanding  improvements  in 
efficiency.  This  is  particularly  true  when  very  small  probability  events  can 
contribute  significantly  to  the  outcome  of  the  problem. 

One  danger  with  the  application  of  importance  sampling  is  that  it  can 
lead  to  results  worse  than  that  obtained  using  straightforward  sampling.  Such 
a  situation  can  occur  when  the  importance  function  is  not  carefully  selected. 
Furthermore,  the  method  requires  a  fairly  good  understanding  of  the  problem. 

4. 3. 1. 1  Implementation  Guidelines  for  Importance  Sampling 

The  general  considerations  that  should  be  followed  in  application  of 
importance  sampling  are  as  follows: 
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1.  Attempt  to  identify  one  random  variable  x  for  importance 
sampling  and  its  density  function  f(x).  Express  if  possible 
the  expected  value  being  estimated  as 

1=  Jg(x)f(x)dx  (4.1) 

2.  Determine  the  functional  form  of  g(x).  This  may  be  known 
analytically  in  trivial  cases.  In  complex  simulations,  it 

may  be  possible  to  input  selected  values  of  x  (not  necessarily 
from  f(x))  and  actually  obtain  an  estimate  for  the  form  of  g(x). 

3.  Plot  the  shape  of  f(x)g(x)  and  select  an  importance  function 
/*(x)  that  is  "similar"  in  form  to  g(x)f(x).  A  sketch  of  the 
basic  ideas  involved  is  shown  in  Fig.  4. 1. 


NOTE:  Select  f*(x)  to  approximate 
g(x)f(x) 


Fig.  4. 1.  Qualitative  Description  of  Importance  Sampling 


4. 


The  new  estimator  for  I  is 


‘i  ■  *  2-rxt- 

i=1  1 

where  X^, . . .  ,Xn  is  a  random  sample  from  f*(x). 

5.  The  estimator  for  the  sample  variance  is  given  by 


c2  N  1 
S  =  N^I  N 


(4.  2) 


(4.3) 


6.  Obtain  a  random  sample  X.. , . . .  ,X_.  ustag  crude  Monte 
Carlo  from  f(x)  and  estimate  1^  and  . 

7.  Obtain  an  estimate  for  the  efficiency  of  importance  sampling 
using 


f  = 


ts2 
— ? 
‘isi 


(4.4) 


where  t.  and  t  are  the  times  required  to  obtain  N  samples 
with  and  without  importance  sampling  respectively. 

It  should  be  noted  that  c  is  a  random  variable  and  is  subject  to 
uncertainty  which  will  depend  on  the  sample  size  N.  Thus,  it  is  usually 
a  good  practice  to  make  N  as  large  as  reasonably  possible  to  obtain  a 
good  estimate  for  c .  In  the  event  several  random  variables  are  involved 
in  the  problem,  the  suggested  procedure  is: 


1.  Identify  one  random  variable  x  for  importance  sampling 
and  express  the  estimator  as 


f  f(x)[  J  g(x,  y)f  (jfa)dy  ]dx  (4. 5) 


where  the  vector  y  is  all  the  remaining  random  variables. 
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2. 


For  a  range  of  values  of  x,  estimate  E[g2(x,y|x)]  where  y 
are  selected  from  their  corresponding  probability  distribu¬ 
tions.  If  yx . .  are  random  samples  of  y,  then 


g2(x,y|x) 


1 

n 


n 


2  ^2(x»yilx) 

i=l 


(4.6) 


is  used  to  estimate  E[g2(x,y|x)]  . 

3.  Select  f*(x)  to  approximate 

«x)E[g2(x,y|x)]1/2  . 

(This  can  sometimes  be  accomplished  graphically.) 

4.  Estimators  with  importance  sampling  are  now  respectively 


The  efficiency  is  computed  in  the  same  manner  as  in  the  previous  case 
4.3.2  Russian  Ro  ilette  and  Splitting 

Russian  Roulette  and  splitting  is  a  powerful  technique  that  is  easiest 
to  apply  when  the  problem  is  characterized  by  a  series  of  events.  Examples 
are  found  in  problems  in  queueing,  series  subsystems,  radiation  trans¬ 
port,  random  walk,  etc. 
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This  technique  is  in  essence  a  simplified  form  of  importance 
sampling.  One  potential  difficulty  with  Russian  Roulette  and  splitting  is 
the  possibility  that  it  may  lead  to  a  large  number  of  histories  being  traced 
through  the  system  at  one  time.  While  Russian  Roulette  is  generally  easy 
to  implement,  incorporating  splitting,  (especially  in  an  existing  program), 
may  be  more  difficult  due  to  the  need  to  store  problem  conditions  and  later 
’restart'  in  mid-history.  Following  a  ’split’,  the  program  must  continue 
with  the  simulation  of  one  of  the  histories  until  it  terminates,  and  the 
program  must  then  go  back  to  the  point  oi  the  split  and  restore  the 
program  conditions  at  that  time  so  that  the  next  ’daughter'  simulation 
can  proceed. 

The  general  steps  that  can  be  followed  for  Russian  Roulette  and 
splitting  are: 

1.  Identify  stages  in  the  problem  for  which  the  possible  conditions 
at  those  stages  can  be  divided  into  regions  »i-VvA 
such  that  all  the  points  in  any  one  region  have  roughly  the 
same  importance. 

2.  Choose  average  weight  standards,  w^.,  i  =  1,  N,  for  each 
region  that  are  inversely  proportional  to  that  region's  im¬ 
portance.  The  mean  weight  standard  at  any  stage  should  be 
roughly  the  average  weight  expected  to  reach  that  stage  from 
the  previous  stage. 

3.  If  no  other  variance  reduction  techniques  are  being  employed, 
set  high  and  low  weight  standards,  wH  and  wj.,  equal  to 
the  average,  w^..  If  there  are  other  variance  reduction  tech¬ 
niques  in  use  which  are  causing  weight  changes,  then  W|j.  and 
wl,  should  be  spaced  sufficiently  far  above  and  below  wyL  so 
that  there  is  no  unnecessary  Russian  Roulette  and  splittingSut 
also  so  that  there  is  not  a  wide  variation  of  weights  among  his¬ 
tories  of  roughly  the  same  importance. 

4.  Whenever  a  history  arrives  at  a  particular  stage  in  region 
R.  with  a  weight  w,  carry  out  the  following  manipulations: 

a.  If  w  <  wLi,  play  Russian  Roulette: 

i.  Kill  (terminate)  the  history  with  probability  1  -  ,or 
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ii.  Let  the  history  survive  (continue)  carrying  a  new 
weight  w^  with  probability  w/w^. 

b.  If  wL  <  w  <  wH^  ,  continue  the  history  with  weight  w. 

c.  If  w>w^  ,  carry  out  splitting: 


i. 


Determine  n  such  that  0<w-nw^<w^ 


ii.  Split  the  history  into  n  'daughter'  histories  which 
start  at  this  point  with  weight  wA 

w-nwA. 

iii.  With  probability  - ,  create  one  more  daughter 

WA.; 


ki 


history  with  weight  wA^  . 


5.  In  scoring,  accumulate  the  outcomes  from  all  daughter  his¬ 
tories  which  originated  from  the  same  initial  or  parent  history. 
That  is,  form  estimates 


l,  daughter  of  j 


6.  Form  the  final  estimate  by  averaging  estimates  from  N 
starting  histories 


I 


1 

JT 


N 


A 

I 


J 


and  calculate  the  sample  variance: 


(4.9) 


(4. 10) 


(4.11) 


4.  3. 3  Systematic  Sampling 

There  are  two  ways  to  implement  systematic  sampling.  Both  are 
presented  below  although  it  is  generally  recommended  that  Method  n  be 
used.  The  application  of  systematic  sampling  can  be  generally  most  ef¬ 
fective  when  initial  conditions  for  a  problem  are  selected  from  a  probability 
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distribution,  although  other  applications  can  be  identified.  In  any  event 
it  is  convenient  to  consider  the  usual  integral  form 


I 


g(x)f(x)dx 


Method  I 


(4. 12) 


In  this  method,  systematic  sampling  is  implemented  as  follows: 


1.  The  cumulative  distribution  for  f(x)  is  determined  as  indicated 
in  Fig.  4. 2.  The  range  (0, 1)  is  divided  into  N  intervals,  each 
of  width  1/N  as  indicated.  N  should  vary  between  5  and  50. 


2.  A  random  sample  R< , . . . ,  1^  of  size  n  is  selected  from  the 
uniform  distribution1  U(0, 1). 

3.  Using  the  sequence,  Rj, . . .  ,Rn,  n  numbers  are  allocated  to 
each  interval  using. 


Fig.  4.2.  Systematic  Sampling 
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5. 


Once  the  values  for  X. .  are  obtained,  the  estimator  used  for 
I  is  J 


i  -  ShZ  E  g(V  *  K  Di  (4-15) 

i=l  j=l  i=l 

where 

N 

>i  •  IT  (4'18) 

j  =  l 

6.  Estimate  the  sample  variance  using 

Method  n 


In  this  method,  the  sampling  is  structured  as  follows: 

1.  The  cumulative  distribution  for  f(x)  is  determined  as  indicated 
in  Fig.  4. 2.  The  range  (0, 1)  is  divided  into  N  intervals, 
each  of  width  1/N.  N  should  vary  between  5  and  50. 

2.  n  sets  of  N  random  numbers  each,  R^, . . .  ,R^;. . . ; 

Rnl, . . .  are  selected  from  U(0, 1). 

3.  n  random  numbers  are  allocated  to  each  interval  according 
to 


i  “  1  f  • . . ,  n 
j  =  1,....,N 


4. 


The  values  of 


are  determined  from 


f(x)dx 


(4. 18) 


(4. 19) 
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5.  The  estimator  for  the  integral  I  is  then  obtained  with 


I  - 


hi  z>v-  it1. 

i=l  j=l  i=l 


where 


N 


a 

j=l 


(4.20) 


(4.21) 


6.  The  estimate  for  the  sample  variance  is  obtained  using 


(4.22) 


It  should  be  noted  that  the  difference  between  Method  n  and  Method  I 
is  that  the  random  numbers  are  generated  independently  in  each  of  the  N 
intervals.  This  requires  more  effort  than  Method  I,  although  Method  n 
will  generally  give  better  results. 

4.3.4  Stratified  Sampling 

This  variance  reduction  technique,  sometimes  called  quota  sampling, 
is  similar  to  systematic  sampling  in  that  specific  numbers  of  samples  are 
generated  in  each  of  several  intervals  spanning  the  sample  space.  In  sys¬ 
tematic  sampling  the  number  of  cases  in  each  interval  is  determined  from 
the  'natural'  proportions  of  the  process  being  simulated.  In  stratlfjftf 
sampling,  on  the  other  hand,  the  number  of  samples  in  each  interval  is 
chosen  to  optimize  the  accuracy  of  the  simulation. 
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Stratified  sampling  can  be  implemented  in  the  following  steps: 

1.  Break  the  range  of  the  random  variable  being  simulated  into 
N  intervals  of  length  Lj, . . . ,  Lj.  as  indicated  in  Fig.  4. 3. 
Typically  N  should  be  between  d  and  50.  Each  L.  is 
selected  so  the  variation  in  g(x)f(x)  is  approximately  the 
same. 


2.  Determine  P*,  the  probability  that  x  will  be  in  the  interval  L 
from 


3. 


Arbitrarily  assign  n,'  ;  j  =  1 . N  as  the  number  of 

samples  to  select  from  each  interval  where  In.’  =  n,  the 
total  number  of  samples  desired.  Select  r  i  =  1 . nj; 


J  =  1,...,N  from  U(0, 1). 


4. 


Determine  from 


Rij  pi + 


and  determine 


x* 

J 


f(x)dx 


S 


,2  .  _"i 


i  nj  -1 


n,' 


k± 

i  < -1 


g2(Xjj)  -  r2 


where 


'i  ■  hj  g «*i, 


5.  Determine  n^  using 


1 
n  n 


fffl 


Epis 


(4.24) 


(4.25) 


(4.26) 


(4.27) 


/ 

where  n  is  the  total  sample  size  to  be  selected  li.  e. , 
N  \ 
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6.  Select  R..  ;  i  =  nj  +  1,. . . ,iij  ;  j  =  l,...,N  fromU(0,l) 

and  determine  ;  i  =  nj  +  1, . . .  ,n^  ;  j  =  1, . . .  ,N  from 

/*U 

/  'W*1  (4.28) 

2 

7.  Estimate  I  and  o  using 


A 

I 


(4.29) 


(4.  30) 


where 


(4.31) 


2 

The  efficiency  of  stratified  sampling  can  now  be  estimated  using  S  as 
determined  in  the  last  step. 


If  the 


2 

aJ 


[g(x)  -  Ij]2  dx 


(4. 32) 


are  known  or  can  be  estimated  from  a  priori  knowledge  of  the  system  being 
simulated,  then  steps  3  and  4  can  be  omitted  and  n^  can  be  determined 
directly  from 
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(4.33) 


Alternatively,  steps  3  through  5  can  be  performed  iteratively  to  determine 
a  best  set  of  values  for  n^, . . . ,  nN 

4.3.5  Expected  Value 

In  any  simulation  consisting  of  several  stages,  it  may  that  the 
expected  value  of  some  of  the  stages  is  either  known  or  can  be  determined 
analytically.  In  such  cases  the  possibility  of  achieving  variance  reduction 
through  replacing  one  of  the  random  stages  by  i+s  expected  value  should  be 
investigated.  The  steps  which  must  be  taken  to  determine  if  replacement 
by  an  expected  value  is  feasible  are: 


1 .  Identify  the  stochastic  processes  in  the  overall  simulation  for 
which  the  expected  value  can  be  calculated  efficiently. 

2.  For  each  stochastic  process  identified  in  1. ,  determine  if  the 
random  element  in  the  process  is  an  essential  part  of  the 
simulation  model.  If  the  fact  that  the  process  randomly  takes 
on  a  range  of  values  affects  the  rest  of  the  simulation,  then 
the  process  cannot  be  replaced  by  its  mean  value.  If,  on  the 
other  hand,  only  the  first  moment  of  the  random  distribution, 
and  not  any  higher  moments,  affects  the  rest  of  the  simulation, 
then  it  is  possible  to  replace  the  random  process  by  its  first 
moment  or  expected  value.  For  any  given  physical  system, 
the  determination  of  which  stochastic  elements  are  essential 
usually  depends  on  the  particular  parameters  being  estimated. 

3.  u  a  random  process  can  be  replaced  by  its  expected  value 
without  loss  of  realism,  that  will  always  reduce  the  variance. 
However,  it  may  not  improve  efficiency  as  it  may  cause 
excessive  computation.  If  the  process  in  question  is  a  branch 
point  where  the  history  may  go  in  either  of  two  (or  more)  direc¬ 
tions,  then  replacing  the  stochastic  event  by  its  expected  value 
requires  splitting  the  history  with  each  part  going  in  one  of  the 


111 


directions  and  carrying  the  probability  of  that  branch  as  a 
weight.  Should  enough  of  these  events  be  encountered  the 
number  of  split  histories  which  must  be  computed  can  easily 
expand  beyond  a  reasonable  bound.  Alternatively,  one  of  the 
branches  of  the  decision  can  be  to  terminate  the  history;  in 
this  case  the  history  is  not  split  but  continues  from  the 
branch  point  with  a  weight  representing  the  survival  proba¬ 
bility.  This  can  easily  lead  to  histories  with  very  low  weights 
which  usually  represents  a  loss  in  efficiency  in  the  calculation. 
Again,  this  determination  is  likely  to  depend  on  the  particular 
parameters  of  interest  in  the  calculation  and  it  is  impossible 
to  give  general  guidelines. 

Figure  4.  4  shows,  in  abbreviated  form,  the  considerations  used 
in  choosing  between  expected  value,  statistical  estimation,  and  crude 
Monte  Carlo  techniques  for  the  simulation  of  a  random  process. 

Once  the  decision  has  been  made  to  replace  a  stochastic  process 
by  its  expected  value,  the  implementation  depends  on  the  role  of  the 
process  in  the  overall  simulation.  Specifically, 

1.  If  the  process  is  one  of  selection  of  a  random  variable,  then 
the  process  becomes  merely  a  deterministic  setting  of  the 
variable  to  its  expected  value  and  the  simulation  proceeds  as 
before  with  no  change  in  estimators,  that  is,  if  y  is  to  be 
selected  from  f(y),  then  set 

y  =  E[f(y)]  (4.34) 

and  continue  the  simulation. 

2.  If  the  process  represents  a  decision  between  terminating 
or  not  terminating  the  history,  then  the  history  continues 
but  with  a  reduced  weight  representing  the  probability  of 
survival.  That  is, 

wnew  ~  ^old  (4.35) 

where  ps  is  the  probability  of  survival  (non -termination) 
at  the  decision  point  and  w^  and  wnew  are  the  weights  of 
the  history  before  and  after  the  replaced  random  process. 

For  any  parameter  being  calculated,  an  estimate  for  each 
history  can  be  made  by  summing  the  contributions  from  that 
history.  That  is, 
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Consider  p  Stochastic  Process  that  is 
Part  oi  a  Simulation  Process 


Can  the  Expected 
Value  be  Computed 
Analytically?  . 


Use  Crude  Monte  Carlo 

Simulate  the  Stochastic  Process 
and  Use  Hit  or  Miss  Estimators 


Can  the  Process  be  Replaced 
by  its  Expected  Value  without 
Loss  of  Necessary  Stochastic 
Detail  in  the  Simulation? 


Is  it  Efficient  to  Replace  the  ^ 
Process  by  its  Expected  Value? 

J 


Can  the  Expected  Value  be 
Used  in  Calculating  Contri¬ 
butions  to  the  final  result? 


Use  Sta\as>\\cal  Estimation 


Simulate  the  Stochastic  Process 
but  Use  Expected  Value  Estimator 


Use  Expected  Value  Technique 

Replace  Process  with  Expected 
Value  in  Simulation  and 
Estimators 


Use  Crude  Monte  Carlo 


Simulate  the  Stochastic  Process 
and  Use  Hit  or  Miss  Estimators 


Fig.  4. 4  Problem  Characteristics  and  the  Choice  of  Crude  Monte  Carlo, 
Expected  Value,  and  Statistical  Estimation  Techniques 


(4.  36) 


i4  -  E  wij  B<xi)> 

j 


where  w„ 
to  the  final 
given  by 


th  th 

is  the  weight  of  the  i  history  at  the  time  of  the  j  n  contribution 
result.  Then  the  final  estimate  and  the  sample  varianc  3  are 


I  =  1/N 


(4.37) 


and 


If  the  contributions  to  a  parameter  from  a  history  would  have  come 
from  the  terminations  in  the  process  which  was  replaced  by  its  expected 
value,  then  the  loss  of  weight  at  each  such  step  is  the  proper  estimate  for 
the  expected  terminations.  In  this  case  we  get 


A 

I. 

1 


E(w, 


..  ..-w  ..)-g(X..)  = 

old,  ij  new,  lj  lj 


E 

j 


wold,ii(1-Ps)-g(Xii)  '4- 39) 


where  j  denotes  the  j**1  occurrence  of  the  replaced  event  in  the  i 

»  2 

history.  The  estimates  for  I  and  S  remain  as  in  (4. 37)  and  (4. 38 
above. 


3.  If  the  process  represents  a  decision  between  two  or  more  branch  points, 
then  the  history  must  be  split  and  followed  from  that  point  on  as  two 
separate  histories,  each  taking  a  different  branch  and  carrying  a  weight 
equal  to  the  probability  of  that  branch.  Parameters  are  estimated  by  sum¬ 
ming  weighted  contributions  from  all  daughter  histories  resulting  from  an 
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original  parent  history,  using  formulas  identical  to  (4. 36),  (4. 37)  and  (4. 38). 
In  cases  2  and  3  above,  histories  may  develop  weights  which  are  very  small. 
As  this  may  entail  spending  a  good  deal  of  computing  time  calculating  his¬ 
tories  that  can  make  only  a  trivial  contribution  to  the  result,  the  efficiency 
may  be  very  low.  To  remedy  this,  Russian  Roulette  (see  Section  4. 1. 2)  can 
be  used  to  eliminate  those  histories  where  weights  become  too  small. 


4.3.6  Statistical  Estimation 

It  is  not  essential,  and  frequently  not  efficient,  for  a  simulation  of 
a  physical  process  to  be  carried  out  to  the  natural  termination  of  the  process 
in  estimating  final  outcomes.  It  is  always  proper  to  stop  the  simulation  at 
any  point  and  to  calculate  through  analytic  or  numerical  means  the  expecta¬ 
tion  of  reaching  any  final  outcome.  Indeed,  the  sooner  the  simulation  is 
stopped  and  the  more  analytic  calculations  are  done,  the  lower  the  variance 
will  be.  Obviously,  however,  the  sooner  the  simulation  is  stopped  the  more 
complex  and  difficult  the  analytic  calculations  become  and  a  point  is  quickly 
reached  where  the  overall  efficiency  is  less  despite  the  gain  in  variance  re¬ 
duction.  At  the  last  step  in  the  simulated  process,  the  probability  of  reaching 
the  various  final  outcomes  needs  to  be  determined  in  order  to  do  the  simula¬ 
tion.  Thus,  it  is  generally  advantageous  to  use  analytic  expectations  for  the 
final  step.  Whether  the  analytic  calculations  should  be  carried  beyond  the 
final  step  will  depend  on  the  particular  process  and  results  desired,  but 
generally  it  is  less  efficient  to  use  analytic  expectations  for  more  than  the 
last  step. 

If  the  process  being  simulated  is  a  once-through  process,  i.e. ,  the 
final  step  can  be  reached  only  once  each  history,  then  the  use  of  expected 
outcomes  is  equivalent  to  the  expected  value  technique.  If  the  process  is 
iterative  or  repetitive  with  many  passes  through  a  branch  point  where  a  final 
outcome  is  possible,  tnere  are  two  ways  of  using  the  analytic  computations. 

One  is  by  the  expected  value  technique  as  ou  lined  in  the  previous  section. 

The  other  is  called  statistical  estimation  and  should  be  used  whenever  the 
expected  value  technique  would  be  inefficient.  In  certain  cases  where  the 
probability  of  the  desired  final  outcome  is  extremely  small,  statistical  estima¬ 
tion  may  be  the  only  way  to  obtain  an  answer.  Figure  4. 4  shows,  in  abbrevi¬ 
ated  form,  the  considerations  used  in  choosing  between  statistical  estimation, 
expected  value,  and  crude  Monte  Carlo  techniques  for  the  simulation  of  a 
stochastic  process. 
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Statistical  estimation  is  implemented  as  follows: 

1.  Identify  the  stochastic  processes  in  the  simulation  which  have 
the  desired  final  outcome  as  one  possible  alternative. 

2.  Each  time  such  a  process  is  encountered  in  simulating  a  history, 
a  contribution  of 

g(X,^,)f(^|S)  (4.40) 

is  scored,  where  g(x,  y)  is  the  function  being  integrated  by  the 
simulation,  Yf  is  the  desired  outcome  of  the  particular  process 
at  hand,  X  denotes  the  current  state  of  all  the  other  variables 
in  the  system,  and  f(Yf  8  )  is  the  conditional  probability  of  obtain 
ing  outcome  Yf  given  x  as  the  status  of  the  system. 

3.  Do  not  modify  the  simulation  itself,  but  continue  to  model  the 
stochastic  process  by  drawing  a  random  number  and  probabilis¬ 
tically  selecting  an  outcome,  i.  e. ,  select  a  Y  from  f(y^) 

4.  Do  net  mix  statistical  estimation  with  crude  Monte  Carlo,  i.  e. ,  if 
the  outcome  of  Step  3  turns  out  to  be  Yf ,  no  additional  scoring 
is  made.  The  contribution  at  this  step  remains  g(^,  Y^)f(Y^|Xj). 

5.  Form  an  estimate  for  the  entire  history  by  summing  the  contri¬ 
butions 

*2  g0tIj.Tfc),(7{l*1j)  •  (4-41) 

j 

where  j  runs  over  all  occurrences  of  the  particular  process 
being  estimated  in  the  i*h  history. 

6.  The  final  estimate  is  average^  over  all  histories 

N 

'  -  R 

i  =  l 

and  the  sample  variance  is 

„2  N  [  1  ^  *2  *2 

s  =  tTi  R  2-  l  _I 

L  v=\ 


(4. 42) 


(4. 43) 
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Correlated  sampling  can  be  one  of  the  most  useful  variance 
reduction  techniques  due  to  the  wide  applicability  of  the  technique  as 
well  as  to  the  large  efficiency  gains  which  can  be  realized. 


There  are  several  types  of  situations  where  the  use  of  correlated 
sampling  is  indicated.  These  include: 


•  The  effect  of  a  small  change  in  the  system  is  to  be  calculated. 

•  The  difference  in  a  parameter  in  two  or  more  similar  cases 
is  of  more  interest  than  absolute  values  in  any  one  case. 

•  A  parametric  study  of  several  problems  is  to  be  performed. 
This  has  greatest  potential  when  the  problems  are  relatively 
similar  in  nature. 

•  The  answer  to  one  of  several  similar  problems  is  known 
accurately.  The  answers  to  the  unknown  problems  can  often 
be  estimated  from  the  known  result. 


The  aim  of  correlated  sampling  is  to  produce  a  high  positive  cor¬ 
relation  between  two  similar  simulations  so  that  the  variance  of  the  dif¬ 
ference  in  results  is  considerably  smaller  than  it  would  be  if  the  two  simu¬ 
lations  were  statistically  independent.  Unfortunately,  there  is  no  general 
procedure  that  can  be  implemented  in  correlated  sampling.  However,  the 
following  procedures  can  convey  some  notion  of  the  methods  useful  in 
producing  correlation.  Let  us  begin  by  considering  two  similar  simulations 
involving  only  a  single  variable,  i.e. ,  it  is  desired  to  estimate 


A  =  Il-!2 

where 


(4.44) 


I 


1 


g1(x)f1(x)dx 


(4.45) 
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and 


g2(y)*2(y)dy 


(4.46) 


Then  the  implementation  of  correlated  sampling  proceeds  as  follows: 

1.  Generate  a  random  sample  X- , . . . ,  Xjj  from  fj(x)  and  a 
sample  Yj, . . . ,  YN  from  f2(y)  using 


i  =  l, 


N 


(4.47) 


where  ;  i  =  1, . . . ,  N  is  a  random  sample  from  U(0, 1). 

2.  Estimate  A  using 

.  x  N  N 

i  =  -  gjIY.)]  =  £  y  4j  (4.48) 

where 

A,  •  gjfXj)  -  g^Y,)  (4.49) 

Estimate  the  sample  variance  using 

-*2!  (4.») 


(Batching  may  also  be  used.) 

If  fj(x)  is  similar  to  ^(y),  the  random  samples  Xj, . . .  ,XN 
and  Yj, . ...,Y^  will  be  highly  correlated.  If  gj(x)  is  also  similar  to  gg(y) 
then  the  estimates  will  also  be  highly  correlated.  This  will  greatly  reduce 
the  variance  in  A,  as  the  history  values,  A,,  will  reflect  almost  totally 
the  real  differences  in  gj(x)fj(x)  and  ggfyJfgCy)  and  not  random  "noise" 
due  to  a  difference  in  random  numbers  used. 
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In  the  more  general  case  the  simulation  involves  a  sequence  of 


random  variables  x  =  Xj,x2, . 


. ,  Xj'  and  the  integrals  being  estimated  are 


h=f =  fff ■  •  foxi'x2 . . . . 

,  (4. 51) 

^xklxlx2>  *  *  *  * *  *  *  * 


and  similarly  for  Ig.  The  procedure  now  is  as  follows: 

1.  Identify,  to  the  maximum  extent  possible,  where  identical 
random  numbers  can  be  used  on  both  problems.  Clearly, 
when  parameters  are  changed  between  the  problems,  this 
may  not  always  be  possible.  However,  it  may  be  possible 
to  use  the  same  uniform  random  numbers  throughout.  (In 
sequential  or  multistage  problems  it  may  be  possible  to 
precompute  once  the  portions  of  the  simulation  which  will 

be  identical  in  the  two  cases  and  then  use  these  computations 
in  the  two  simulations,  thereby  reducing  the  computational 
effort  required.) 

2.  For  each  history  i,  generate  a  random  sample  Rji,  ...,11^ 
from  the  unit  uniform  distribution  U(0, 1).  Solve  Tor 

Xij,  • . . ,  X.k  using 

A) 

Ri j  ~J  M*jlXil,Xi2 . Xi(j-l))dxj  (4.52) 

-» 

and  for  Y.  j, . . . ,  Yik  using 


RiJ~f  VyjlYil,Yi2,,,,Yi(j-l^dyj  *4,53* 

•• 

3.  Form  an  estimate  for  each  history 

*  gjfXjj.Xjj . X^)  -  . Yjjj)  (4.54) 

A  A 

=  !li  ‘  *2i 
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4.  Calculate  the  final  estimate  by 
*  *  IT  £  *1 

i=1. 

ami  the  sample  variance 

®2  =  tri 

(Alternatively,  batching  can  be  used  to  calculate  the  variance. ) 

In  most  practical  problems  one  does  not  want  to  develop  a  completely 
new  simulator  to  estimate  the  difference  in  parameters  but  is  rather  faced 
with  the  problem  of  using  an  existing  simulator  program  which  was  designed 
to  solve  a  single  case.  Thus  two  separate  runs  must  be  made,  but  the  cor¬ 
relation  generated  in  step  2  can  be  retained  if  the  basic  random  sample 

Rll’ R12*  *  *‘’Rlk’  R21,,,,,RNk  is  generated  in  both  programs.  Here 

the  property  possessed  by  the  congruential  uniform  random  number  gener¬ 
ator  of  always  producing  the  same  sequence  of  numbers  when  given  the  same 
starting  value  becomes  very  useful.  It  is  then  only  necessary  to  ensure  that 
the  two  separate  runs  start  with  the  same  random  value  and  they  will  con¬ 
tinue  to  generate  the  same  sequence  Rjj,  . . . ,  However,  this  is  not 
quite  enough  for  most  simulations.  It  is  usually  the  case  that  k,  the  num¬ 
ber  of  random  variables  in  a  history,  is  itself  a  random  variable  and  can 
vary  from  one  simulation  to  the  other  due  to  the  difference  in  the  problem 
solved.  Thus,  for  the  maximum  correlation  the  random  number  generator 
should  be  set  at  the  start  of  each  history  to  a  value  that  is  common  in  both 
runs,  i. e. ,  force  the  values  R^,  Rgj,  Rg j, . . . ,  R^  to  be  the  same  in 
both  runs  and  all  the  rest  of  the  random  sequence  will  be  identical  in  the 
two  cases.  If  step  1  identified  portions  of  the  simulation  which  could  be 
identical  in  the  two  cases,  it  would  be  desirable  to  force  common  starting 
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values  on  the  random  number  generator  at  the  start  of  each  such  portion 
of  the  simulation  and  not  just  at  the  start  of  the  history. 

To  generate  values  Rjj,  ^1*^31 . R^j  which  are  themselves 

random  numbers  but  are  consistent  in  the  two  runs,  a  second  random 
number  generator  is  used  which  does  nothing  but  generate  starting  values 
for  the  main  random  number  generator  used  in  the  simulation.  As  this 
second  generator  is  used  only  once  each  history,  it  is  unaffected  by  the 
difference  in  the  two  cases  and  will  generate  identical  starting  values  in 
both  runs.  (Note  that  one  should  use  the  binary  integer  produced  by  the 
second  generator  and  not  the  floating  point  random  number  as  a  starting 
value  for  the  main  random  number  generator.) 

Having  made  two  separate  runs  which  are  correlated,  the  problem 
then  is  to  compute  the  difference  estimates.  To  estimate  the  variance 

A  A 

produced  by  the  correlation  one  must  save  the  estimates,  1^  and  Ig.,  pro¬ 
duced  each  history  (or  each  batch,  if  batching  is  used),  and  then  in  a  sep¬ 
arate  calculation  obtain  the  estimated  difference  and  the  sample  variance 
from  4.  54,  4. 55,  and  4.  56. 

4.3.8  History  Reanalysis 

History  reanalysis  involves  generating  a  series  of  histories  for 
one  case  and  then  reanalyzing  them  to  generate  an  estimate  for  a  similar 
case.  This  combines  the  advantages  of  a  saving  in  computer  time  with  cor¬ 
related  sampling  (Section  4.3. 7)  since  only  one  simulation  was  run  to  get 
two  results  which  are  correlated  due  to  the  use  of  identical  random  numbers. 

Basically,  the  types  of  problems  to  which  history  reanalysis  can 
be  useful  are  a  subset  of  those  where  correlated  sampling  is  useful.  That 
is,  when  differences  in  similar  problems  are  to  be  addressed  or  when 
sensitivity  analyses  are  to  be  performed  (see  Section  4. 3. 7).  In  addition, 
it  is  necessary  that  the  difference  in  the  cases  studied  be  expressable  as  a 
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difference  in  the  random  distributions  used  or  in  the  pay-off  function, 
g(x),  and  not  be  a  difference  in  deterministic  elements  of  the  simulation. 
It  is  commonly  a  sensitivity  analysis  where  history  reanalysis  is  likely 
to  be  most  effective. 


As  in  the  case  of  correlated  sampling,  there  is  no  general  pro¬ 
cedure  that  can  be  followed  in  history  reanalysis.  However,  the  following 
procedure  illustrates  the  general  principles  used  to  derive  the  results  for 
one  problem  (I2)  from  another  problem  (Ij)  where,  as  usual, 


[j  =  f  gjWf^xJdx 
Jmm 


and 


l2  = 


r 

I  g2(x)f2(x)dx . 


(4.57) 


(4.  58) 


1.  Generate  a  random  sample  Xj, . . .  ,XN  from  fj(x). 


2.  Obtain  an  estimate  for  1^  from 


't  *  rE  e,<xi> 

i=l 


and  for  <7j  using 


N 


ftt  IfS  gi(xi}  “  ?i\ 

V  i=l 


(4.59) 


(4.60) 
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3.  Obtain  an  estimate  for  I2  from 


*  i  r-'  w 
g2(xi)  U71 
i=l  1  1 


(4.60) 


and  for  o 2  from 


s:  = 


'  1=1 


*2<Xi> 


W 

m 


-2 


(4.61) 


The  above  procedure  could  clearly  be  used  in  the  analysis  of  several 
other  integrals  and  also  for  the  differences  in  integrals  (as  was  the  case  used 
for  correlated  sampling) . 

In  problems  of  a  sequential  (or  multistage)  nature  there  may  be  several 
points  at  which  reanalysis  of  the  original  problem  is  performed  in  a  manner 
similar  to  that  described  above.  Care  must  be  taken  however  to  avoid  poten¬ 
tial  difficulties  where  branching  decision,  etc.  are  based  on  the  outcome  of 
prior  events  in  the  problem.  These  must  be  appropriately  accounted  for,  but 
the  general  procedure  outlined  above  can  be  useful. 

For  proper  use  of  history  reanalysis,  (x)  cannot  be  zero  for  any 
point  x  where  f2(x)  is  not  also  zero.  The  converse,  however,  is  not  true. 

In  fact  there  is  a  large  set  of  cases  where  history  reanalysis  is  most  useful 
where  gjfc)  is  the  same  as  g2(x)  and  f200  {ijl^  x/R^'  1x1  41118  case 

the  ” weights "  used  in  calculating  I2,  ^CXj)  ,  are  either  1  or  0  and  we  have 
as  a  replacement  for  4.61 


l2  =  NT 


I  BitXj) 


2XJcR2 


and  as  a  sample  variance 


(4.62) 


S2  = 


N, 


2  fi  y 

N2-1  p2  X.c] 


2 

h 


<xi>  -  £i] 
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where  Ng  is  the  number  of  histories  for  which  X 
kind  of  case  consider  a  simulation  of  an  antisubmarine  mission  where  the 
problem  is  limited  by  the  total  mission  time.  It  is  desired  to  calculate  kill 
probabilities  for  a  range  of  mission  times.  The  simulation  is  run  for  the 
longest  time  of  interest,  and  the  histories  can  then  be  reanalyzed  to  deter¬ 
mine  kill  probabilities  for  shorter  times  by  simply  ignoring  the  kills  which 
occur  after  the  time  in  question. 

One  worry  in  history  reanalysis  is  that  fj(x)  may  be  too  differed 
from  t^x)  to  do  a  reasonable  job  of  estimating  Ij.  The  result  may  be  that 
4. 61  will  prove  to  be  an  'overbiased '  or  'underbiased'  estimation.  It  is 
recommended  that  users  be  aware  of  the  considerations  mentioned  in 
Section  2. 5  whenever  using  history  reanalysis. 

4.3.9  Control  Variates 

In  the  calculation  of  an  integral 


As  an  example  of  this 


g(xj  f(x)  dx, 


(4.  63) 


if  an  approximate  function,  h(x )  *  g(x),  can  be  found  such  that 
0  =  J  h(x)  f(x)  dx  is  known  or  can  easily  be  determined  analytically, 

then  the  control  variate  technique  should  be  used. 

In  this  case  the  integral  I  may  be  written  as 


1  *  C  h(x)f(x)dx  +  -  h(x)]f(x)dx 

*  ®  +  [g(x)  -  h(x)]f(x)dx  =  0  + 
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Then,  the  simulation  is  not  performed  on  I  directly,  but  rather  on  the  expected 
difference  between  g(x)  and  h(x),  Ij. 

The  procedure  to  follow  in  implementation  of  control  variates  is 
straightforward.  Namely, 

1.  Express  the  parameter  or  parameters  to  be  estimated  in  integral 
form  as  indicated  above. 

2.  For  each  expected  value,  I,  attempt  to  obtain  an  approximating 
function  h(x)  whose  expected  value,  0,  is  known. 

3.  Structure  the  simulation  such  that  the  difference  between  h(x) 
and  g(x)  given  by 

=  [g(*)-h(x)]f(x)dx  (4.65) 


is  simulated. 

4.  Generate  a  random  sample  X. . XM  from  f(x)  and  estimate  I. 

using  1  w  1 

ij*  IT  £  ttfXj)  -MX,)]  (4.66) 

whose  sample  variance  is  given  by 


1 

KT 


[gfXji-htx,)]2  -  i\ 


(4.67) 


Frequently,  the  real  process  being  simulated  will  give  clues  as  to 
potential  approximating  functions.  However  in  many  cases  an  approximate 
value  for  g(x)  will  not  be  available.  This  can  sometimes  be  achieved  by  the  use 
of  a  sequential  sampling  procedure  in  which  a  few  simulations  are  performed 
to  obtain  an  approximate  representation  to  g(x) .  Clearly,  the  better  the  apnrox 
imation  for  g(x)  that  can  be  obtained,  the  better  the  results  will  be. 

The  extension  of  the  control  variate  concept  to  multiple  dimensional 
integrals  is  clearly  evident  and  is  accompanied  with  the  usual  complications 
associated  with  such  extensions. 
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4.3. 10  Antithetic  Variates 


When  two  estimators  for  a  parameter  of  interest  are  known,  then 
it  is  possible  to  combine  them  to  form  a  third  estimator.  If  the  two 
original  estimators  are  negatively  correlated,  then  the  combined  esti¬ 
mator  can  have  a  variance  which  is  smaller  than  the  variance  of  either 
of  the  original  estimators.  The  usual  method  for  achieving  negative  cor¬ 
relation  is  to  manipulate  the  random  number  generation.  Although  there 
are  many  different  ways  this  can  be  achieved,  the  following  formulation 
(which  uses  a  variation  oi  stratified  sampling)  is  very  easy  to  implement. 

1.  Express,  as  usual,  the  parameter  (or  parameters)  to  be  estimated 
in  integral  form  as 

1  =  g(x)f(x)dx  (4. 68) 


2.  Select  a  value  for  the  parameter  a'O  <  a  <  1)  and  select  X.  and 
X^  for  i  =  1, . . .  ,N  from 

X. 

dR.=jJf(x)dx  I4-69) 

and 

1  -  aR.  =  J*  i  f(x)dx.  (4-  7°) 

where  R^;  i  =  1, . . . ,  N  is  a  random  sample  from  U(0, 1). 

3.  Construct  the  unbiased  estimator  $  using 


(4. 71) 
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where 


;i  N  (4.72) 


52j  (4. 73) 

Selection  of  an  appropriate  value  for  a  is  not  always  clear.  One  use 
of  antithetic  variates  uses  a  =  1/2.  Another  approach  is  to  perform  several 
simulations  for  various  values  of  a  and  estimate  the  efficiency  as  a  function 
of  a. 


=  agfXj)  +  (1  -  4g(X{) 
with  the  sample  variance 


4.3. 11  Regression 


The  application  of  regression  techniques  to  reduce  variance  in  simula¬ 
tions  can  be  associated  with  problems  in  which  a  set  of  integrals  Ij, . . . ,  I 
are  to  be  estimated  from  a  set  of  estimators  . . .  .en(n  *  p)  satisfying 


where  X  is  a  known  n  x  p  matrix  of  the  form 


(4. 74) 


(4.  75) 


Based  on  the  concept  of  minimum  variance  unbiased  estimators,  the  following 
procedure  may  be  used  to  obtain  an  estimate  for  I  using  regression. 
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1.  Perform  a  simulation  N  times  to  obtain  N  values  for  each 

. 0n<  Define  these  values  as 

o  .  k  =  1, . . . , N 
Ki  ’  i  =  1, . . .  ,n 

2.  Obtain  the  sample  means 


i  N 

'i =  u  k|j \i;i=i' 


and  construct  the  matrix 


•  •  f  H 


(4. 76) 


(4.77) 


3.  Estimate  the  covariance  matrix 


A  A 

T1I - vln 

A 


v2Vv2n 


0  0  J 

nf  '  *  ’  nn 


where 


(4. 78) 


Tij  °  (\i  "  ,i)  (^tj  '  8J*  ’  1  '  1 . “ 

j  =  1»  •  •  • » n 

4.  The  unbiased  estimator  for  ?  is  obtained  from 

.»  -*  -*  •* 

A  -*T  A  _1  -*  _1  -4  T  A  -1 A 

I  =  (A  V  1  A)  1  A1  V  le 

-4'T’  -» 

(A  is  the  transpose  of  A. ) 


(4. 79) 


(4.80) 


Ii  is  recommended  that  an  estimate  for  the  sample  variance  be 
obtained  by  batching. 
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APPENDIX  A 


SUMMARY  OF  ANALYTICAL  EXPRESSIONS  FOR 
APPLICATION  OF  VARIANCE  REDUCTION  TECHNIQUES 


A  convenient  summary  of  the  basic  expressions  used  in  implement¬ 
ing  the  more  important  variance  reduction  techniques  is  presented  in  Table 
Al.  For  the  most  part  the  table  is  self  explanatory.  However,  it  ataiuld  be 
noted  that  all  possibilities  are  not  considered.  For  example,  the  results  of 
applying  Russian  Roulette  and  splitting  is  shown  for  a  two-stage  problem 
only. 

Also  it  should  be  noted  that  the  specialized  techniques  which  were 
introduced  in  Part  I  were  not  included  here. 


Pnceitaf  puQm 


1S1 


TABLE  A-l 

Summary  of  Analytical  Expressions  for  Application  of  Variance  Reduction 


if  I  ||  f 
1,1  j  i] 

•  •  •  Sa  m  s 

r  »  «r  Sr  o 


TABLE  A-l  (contd) 
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Bracken,  J. ,  McCormick,  G.  P. ,  "Selected  Applications  of 
Non-Linear  Programming, "  John  Wiley  6  Sons,  New  York,  1968. 

A  book  that  presents  several  selected  optimization  problems.  Of 
particular  interest  here  is  the  application  of  optimization  methods 
to  selection  of  optimal  strata  for  sampling  in  the  sense  of  minimum 
variance. 

Burt,  J.M.  andM.B.  German,  "Conditional  Monte  Carlo:  A  Simu¬ 
lation  Technique  for  Stochastic  Network  Analysis, "  Management 
Science,  18,  No.  3,  207-217,  Nov.  1971. 

This  paper  is  concerned  with  a  simulation  procedure  for  estimating 
the  distribution  functions  of  the  time  to  complete  stochastic  networks. 
The  procedure,  called  conditional  Monte  Carlo,  is  shown  to  be  sub¬ 
stantially  more  efficient  (in  terms  of  the  computational  effort 
required)  than  traditional  simulation  methods.  The  efficiency  of  con¬ 
ditional  Monte  Carlo  and  its  use  in  conjunction  with  other  Monte  Carlo 
methods  is  illustrated  for  the  Wheatstone  bridge  network.  The 
applicability  oi  the  procedure  to  larger  networks,  as  well  as  other 
stochastic  systems,  is  discussed,  and  a  general  method  is  given  for 
its  implementation. 

Clark,  C.E. ,  "Importance  Sampling  in  Monte  Carlo  Analyses, " 
Operations  Research,  603-620,  Sept-Oct.  1961. 

Some  Monte  Carlo  analyses  require  hundreds  of  hours  of  high  speed 
computer  time.  Many  problems  of  current  interest  cannot  be  handled 
because  the  computer  time  required  would  be  too  great.  Statistical 
sampling  procedures  have  been  developed  that  greatly  reduce  the 
required  computer  time.  Importance  sampling  is  one  of  these.  This 
paper  is  an  elementary  description  of  importance  sampling  as  used 
in  Monte  Carlo  analyses. 

Clark,  F.  H. ,  "The  Exponential  Transform  as  an  Importance  Sampling 
Device  -  A  Review, "  Oak  Ridge  National  Laboratory  (AEC)  ORNL- 
RSIC-14,  1-50,  January  1966. 

The  exponential  transform  is  reviewed,  with  emphasis  on  its  use  as  a 
guide  to  effective  importance  sampling  in  the  solution  of  the  Boltzmann 
equation  by  Monte  Carlo  methods.  Contributions  of  various  workers 
are  assembled,  along  with  numerical  results.  Special  consideration  is 
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given  to  approximate  forms  and  to  effective  practical  methods. 

Problems  related  to  negative  effective  cross  sections,  tracking  across 
discontinuities,  directional  biasing  in  inhomogeneous  media,  and  high 
variance  in  back -scattered  components  are  specifically  treated. 

5.  Conveyou,  R.R. ,  V.R.  Cain  and  K.J.  Yost,  ’'Adjoint  and  Importance 
in  Monte  Carlo  Application, "  Nuclear  Science  and  Engineering,  27, 
219-234,  1967. 

The  use  of  the  Monte  Carlo  method  for  the  study  of  deep  penetration 
of  radiation  into  and  through  shields  entails  the  use  of  sophisticated 
methods  of  variance  reduction  to  make  such  calculations  economical 
or  even  feasible.  This  paper  presents  an  exposition  of  the  most  use¬ 
ful  methods  of  variance  reduction.  The  exposition  is  unified  by  con¬ 
sistent  exploitation  of  adjoint  formulations  to  estimate  expected  values, 
as  in  previous  work,  and  further  to  evaluate  the  variance  of  the  resulting 
estimates. 

The  connection  between  adjoint  formulations  and  the  choice  of  biasing 
schemes  is  also  investigated.  In  particular,  it  is  shown  that  the 
value  function  (the  solution  oi  the  Integral  equation  of  the  adjoint 
formulation)  is  always  a  good  choice  for  importance  function  biasing; 
a  sharp  upper  bound,  independent  of  the  particular  problem  is  found 
for  the  resulting  variance.  Predicted  (analytic)  and  experimental 
(Monte  Carlo)  results  are  also  given  for  a  simple  one -dimensional 
problem. 

6.  DeGrott.  M.H.  and  N.  Starr,  "Optimal  Two-Stage  Stratified  Sampling, " 
The  Annals  of  Math.  Statistics,  40,  No.  2,  575-582,  1969. 

This  paper  develops  effective  approximations  to  the  optimal  sampling 
for  situations  where  the  total  number  ol  available  observations  is 
large,  and,  therefore  the  optimal  number  of  observations  that  should 
be  obtained  at  the  first  stage  will  also  be  large  in  a  two  strata  popula¬ 
tion  where  the  sampling  is  accomplished  in  two  stages.  The  techniques 
can  be  extended  to  multi  strata  problems  provided  the  observations  at 
each  strata  have  a  normal  distribution. 

7.  Ehrenfeld,  S.  and  S.  Ben-Tuvia,  'The  Efficiency  of  Statistical  Simu¬ 
lation  Procedures, "  Technometrics,  4,  No.  2,  257-275,  May,  1962. 

Various  methods  for  improving  the  efficiency  of  statistical  simulation 
of  complex  systems  are  described  and  illustrated  for  simple  queueing 
situations.  The  paper  proposes  that  the  efficiency  and  effectiveness 
of  statistical  simulations  can  be  increased  through  the  adaptation  of 
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experimental  design  principles  which  exploit  any  qualitative  knowledge 
surrounding  the  problem  under  study.  Some  techniques  explored  are 
stratified  sampling,  sequential  sampling,  importance  sampling  and  the 
use  of  concomitant  information. 

8.  Evans,  D.  H. ,  "Applied  Multiplex  Sampling,  "  Technometrics,  Vol.  5, 
No.  3.  August  1963. 

Multiplex  sampling  is  an  extension  of  standard  Monte  Carlo  methods 
for  estimating  characteristics  of  the  distribution  of  a  response  when 
the  response  is  a  function  of  several  variables,  each  of  which  comes 
from  a  known  distribution.  The  extension  is  required  when  each 
variable  is  available  in  a  variety  of  distributions.  Depending  on  the 
number  of  variables  there  are  many  possible  different  combinations 
each  of  which,  in  general,  will  give  a  different  distribution  to  the 
response.  If  characteristics  of  the  response  are  to  be  estimated  for 
many  or  all  oi  these  combinations,  there  will  be  a  plethora  of  Monte 
Carlos  to  be  performed  if  usual  procedures  are  followed.  This  in 
turn  can  require  a  great  number  of  observations  of  the  response;  if 
these  are  difficult  to  obtain,  e.  g. ,  if  they  must  be  determined  experi¬ 
mentally,  the  carrying  out  of  such  a  program  can  easily  prove  imprac¬ 
ticable.  Multiplex  sampling  is  a  method  for  estimating  the  character¬ 
istics  for  all  the  different  distributions  for  the  response  by  using  a 
relatively  small  number  of  observations.  This  is  accomplished  by 
sampling  from  an  efficient  sample  space  and  then  using  weighted 
sampling  formulas.  The  functional  form  for  the  probability  density 
function  describing  this  sample  space  is  derived  in  a  companion  paper; 
here  we  assume  this  form  and  consider  the  practical  aspects. 

9.  Fishman,  G.S. ,  "The  Allocation  of  Computer  Time  in  Comparing  Sim¬ 
ulation  Experiments, "  Operations  Research,  16,  280-295,  March- 
April,  1968. 

This  paper  investigates  the  problem  of  efficiently  allocating  computer 
time  between  two  simulation  experiments  when  the  objective  is  to  make 
a  statistical  comparison  of  means.  For  a  given  level  of  accuracy  the 
results  show  that  significantly  less  computer  time  is  required  when 
the  sample  sizes  are  determined  according  to  a  certain  rule  than  when 
the  sample  sizes  are  equal.  A  graphical  analysis  suggests  that  small 
errors  in  estimating  the  population  parameters  of  the  allocation  rule 
do  not  significantly  affect  the  efficient  allocation  of  time.  The  influence 
that  the  degree  o£  autocorrelation  has  on  the  time  allocation  is  also 
investigated;  results  show  that  small  differences  in  the  autocorrelation 
functions  are  important  when  each  process  is  highly  autocorrelated. 
Positively  correlated  samples  for  the  two  experiments  are  examined 
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and  incorporated  into  the  efficient  allocation  rule.  It  is  shown  that 
their  use  leads  to  a  saving  in  computer  time.  A  two-stage  procedure 
is  described  wherein  initial  estimates  of  the  population  parameters 
are  computed  which  permit  the  experimenter  to  estimate  how  many 
more  observations  to  collect  on  each  experiment.  The  procedure  is 
simple  and  straightforward  to  implement  and  should  be  of  practical 
value.  When  the  computer  time  requirements  turn  out  to  be  prohibitive, 
we  suggest  using  negatively  correlated  replications  on  each  experiment. 

This  may  be  accomplished  by  using  antithetic  variates.  The  two-stage 
procedure  also  applies  in  this  case. 

10.  Garman,  M.B. ,  "More  on  Conditioned  Sampling  in  the  Simulation  of 
Stochastic  Networks, "  Management  Science,  Vol.  17,  No.  1, 

September  1972. 

The  technique  of  conditioned  sampling  has  been  shown  to  improve 
simulation  efficiency  in  the  estimation  of  stochastic  activity  network 
duration.  This  paper  describes  a  method  for  generalizing  the  condi¬ 
tioned  sampling  approach  from  its  current  use  of  product-form 
estimators  to  the  use  of  product/convolution-form  estimators.  Esti¬ 
mators  of  the  latter  type  are  constructed  and  demonstrated  to  require 
fewer  samples  per  realization  (hence  increased  estimation  accuracy) 
in  almost  aJi  networks.  An  algorithm  for  estimator  construction  is 
presented  and  proven  to  apply  to  any  given  activity  network.  It  is  also 
shown  that  the  derived  product -convolution -form  estimators  may  require 
a  precedence  structure  within  the  sampling  sequence  which  creates  their 
corresponding  realizations. 

11.  Gaver,  D. P.  Jr.,  "Statistical  Methods  for  Improving  Simulation 
Efficiency, "  Carnegie -Mellon  Universtiy,  Pittsburgh,  Pa.,  August  1969. 

AD694445 

The  paper  presents  a  variety  of  statistical  devices  for  improving  the 
effectiveness  of  computer  simulations  of  random  processes.  The 
methods  are  illustrated  by  examples  from  a  queueing  problem  that  is 
inadequately  treated  by  analytical  approaches. 

12.  Goertzel. ,  G.  and  M.  H.  Kalos,  "Monte  Carlo  Methods  in  Transport 
Problems,"  Progress  in  Nuclear  Science,  Series  I,  Volume  n, 

Pergamon  Press,  p.  315-369. 

The  article  is  devoted  to  the  discussion  of  the  applications  of  the  Monte 
Carlo  method  in  the  field  of  nuclear  energy.  An  account  of  the  theory 
is  given,  including  preliminary  material  on  random  and  pseudorandom 
numbers  and  on  choosing  from  probability  distributions.  The  target 
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game  and  the  transport  game  are  described  in  detail,  with  the 
emphasis  put  on  generality.  The  final  section  deals  with  specific 
applications  to  some  shielding  and  reactor  core  calculations. 

13.  Hague,  J.F. ,  "Variance  Reduction  in  the  Monte  Carlo  Method  for 
Determining  the  Volume  of  Multidimensional  Non  Analytic  Solids,  " 
Nuclear  Instruments  and  Methods,  J7,  194-200,  1967. 

A  Monte  Carlo  method  for  finding  the  volume  of  any  definable  object 
located  within  a  unit  cube  is  considered.  The  method,  which  does 
not  require  the  surface  of  the  solid  to  be  described  by  an  explicit 
function,  is  developed  into  suitable  program  form  and  is  tested  for 
"cylinders,  spheres  and  pyramids  in  2,  4  and  6  dimensions.  Variance 
reduction  factors,  over  straightforward  Monte  Carlo,  of  up  to  30  for 
a  6-dimensional  "cylinder,  "  and  3  lor  a  6-dimensional  "pyramid"  are 
obtained.  An  example  is  given  of  the  application  of  the  method  to  high 
energy  particle  physics. 

14.  Hammersley,  J.M.  and  D.  C.  Handscomb,  Monte  Carlo  Methods. 

Methuen  &  Co.  Ltd.  London,  1964. 

One  of  the  most  useful  references  available  today  on  Monte  Carlo,  it 
presents  the  general  Monte  Carlo  concepts  and  methods,  techniques, 
for  generation  of  random  numbers  and  applications  to  problems  in 
solution  of  linear  equations,  reactor  shielding,  statistical  mechanics 
flow  in  random  media  (percolation  processes)  and  multivariable  systems. 

15.  Hartley,  H.O.  and  J.  Rao,  "Variance  Estimation  in  Linear  Models 
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probability  space.  Indications  are  that,  in  cases  of  interest,  the  vari¬ 
ability  is  thereby  considerably  reduced,  as  is  illustrated  in  an  applica¬ 
tion  concerning  trimmed  and  Winsorized  means. 

32.  Sarndal,  C.  E. ,  "The  Use  of  Stratification  Variables  in  Estimation  by 
Proportional  Stratified  Sampling, "  Amer.  Statistical  Assoc.  J. ,  63, 
1310-1320,  1968. 
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where  the  P.  are  stratum  weights  and  yj  and  X}  are  means  of  the  units 
sampled  from  the  i:th  stratum.  The  two  estimates  are  similar  in  that 
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the  sample.  While  Ia  is  the  more  inexpensive  estimate,  L  is  usually 
preferable  if  one  judges  by  the  size  of  the  mean  square  error,  which, 
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